In popular culture, “here be dragons” (hic sunt dracones in the original Latin) means dangerous or unexplored territory and is often thought to be an imitation of the medieval custom of putting pictures of dragons or other mythical beasts on uncharted or blank areas of maps where danger was thought to exist.
In reality, whilst such illustrations were relatively common in early maps, the phrase itself is only known to appear on one surviving map, the Hunt-Lenox Globe, which is dated between 1503 and 1507 and which today resides in Rare Books Division of the New York Public Library.
How does a Latin phrase on a medieval globe have anything to do with Kamma or to do with open data? To badly paraphrase George Orwell, all open data is open, but some is more open than others. So, to start with, we should probably define what open data is.
The Open Definition summarises open data as this :-
Open data is data that can be freely used, re-used and redistributed by anyone – subject only, at the most, to the requirement to attribute and share-alike.
The full definition goes into a lot more detail, but the three most important points are :-
At Kamma, we make use of a lot of open data and we’ve recently codified the way in which we look at open data to see whether this is the right data for us to use or whether there’s any hidden dragons, or other dangers that could mean that maybe this isn’t the right data for us to use. Our open data guidelines look like this …
Is this data produced by or on behalf of an official organisation or body?
On the plus side, “official” data can often be more authoritative but the flip side of this is that it may not be updated regularly.
Is this data produced by an open data and/or open source community?
Community data is usually regularly updated, often in near real time. It may not be the case that community data is any less accurate that officially produced data.
Is this data regularly updated?
Has the data been produced once and then left alone? It’s rare that data doesn’t need to be updated. The frequency will vary considerably according to what the data represents.
Is the data released in a format that allows us to read it and consume it easily?
As a general rule, data that is released in a text format, such as CSV or GeoJSON, is preferable to data that is released in a binary form. Data, regardless of licensing, that is only available via an API is less attractive as it means you can only query the API for specific items of data rather than downloading the entire data set, which is usually preferable.
Does the data require specialist or proprietary software to read it?
Some binary forms of data, require other software or software libraries to read. If the data is in an open standard then libraries should be available to read and write this data format, such as Esri’s ShapeFile format. Conversely, if the data is in a format which requires you to licence additional software applications or libraries, there are cost implications as well as the real risk that these libraries could be unsupported in the future.
Is the data available online for easy download?
Can you easily acquire the data and updates or do you have to order a specific time limited URL or physical download? The latter approach can make automated updating of your data sets more manual and time consuming.
Are updates to the data incremental or include everything?
When a data set is updated does the update wrap up all previous updates or does it just contain records which have been added, changed or deleted since the last release?
Does the data contain stable and consistent identifiers for each record?
Does each record in the data set have a unique identifier? Does this identifier remain the same across releases or is it only unique in the context of a single release. You should consider how to manage data sets that do not have stable identifiers or even do not have identifiers at all.
Is the data consistent and documented?
Even the simplest of data needs supporting documentation for each field and each field’s data type.
Can this data easily hold hands with our existing data?
A lot of data sets can be linked together if you know what one identifier in a data set is equivalent to another identifier in a different data set. This can aid you in linking data sets, allow more insight to be gleaned than from a single data set and also suggest other possible data sets which may be of value.
Is the data formally licensed under an open data licence?
Not all open data licences are equal. For example, open data produced by the UK Government must now be licensed under the Open Government Licence which places very few limitations on use. But some open data licences are more onerous in their restrictions. It’s important to consider the business implications based on the requirements of an open licence and how you plan to make use of the data.
Does the licence allow commercial use?
Some open licences disallow commercial use under any conditions, which may preclude using the data. Some do allow commercial use but would need to be under a formal, paid for, licence. That shouldn’t mean you can’t use the data per se, but you should have a discussion about the balance between cost and licensing conditions, which can be more restrictive under a formal scheme and the value the data can add.
Does the licence have an attribution clause?
Attributing the data means that you need to credit your use of the data in some manner. At Kamma we list the data source and the licence as part of the About section of our website.
Does the licence have a share-alike clause?
A share-alike clause can be more problematic in an open licence than an attribution clause as it can mean that if you co-mingle the data with your own data, that resultant data set must be released under the same licence and made publicly available. If this isn’t a viable proposition for you, you can still keep the data sets apart in their own silos and cross reference them depending on your needs.
Does the licence permit a derived work to be produced?
In addition to other licensing terms, some data licenses, both open and proprietary forbid co-mingling and producing a derived data set. Generally, that makes their use challenging, though as mentioned above, you may be able to do so if you keep it separated from all other data sets.
In summary and by no means complete or comprehensive, the questions above have allowed Kamma to quickly and easily triage whether an open data set is right for us to use as well as realising the immense value and benefits that well produced open data has to offer. While there is no one size fits all approach to open data, asking the same questions consistently when looking at an open data set has allowed us to navigate and avoid the perils and hazards of the dragons of open data.
Geospatial technology company Kamma has been awarded grant funding by the Department for Energy Security and Net Zero (DESNZ) through the Green Home Finance Accelerator (GHFA). The winning project aims to overcome the barriers experienced by customers in engaging with retrofit activities while also increasing industry collaboration around the retrofit problem. The GHFA, part of […]
Read moreGeospatial technology firm Kamma was announced the winner of Best Environment Data Provider for ESG at the prestigious ESG Insight Awards. RegTech Insight, part of the A-Team Group, focuses on how regulations impact data, technology and processes at financial institutions. The 2023 awards recognised both established companies and newcomers providing leading ESG solutions, services and […]
Read moreRate rises, government regulation, a sharp decline in new mortgage applications: the priority list for lenders may never have been longer This white paper argues why lenders must catalyse a super tipping point in customer decision making in order to deliver the UK’s retrofit revolution. The size and scale of the challenge will change the […]
Read moreRegular news, information and insights from Kamma. No spam. Unsubscribe at any time.
Subscribing ...
Sorry, we really want to but we couldn't subscribe you due to missing or incorrect information; please update the information that's highlighted in red and try again.
Well this is awkward. Something went wrong on the internet between your browser and our newletter subscription service. Please let us know and we'll do our best to fix it for you.
Thanks for subscribing! Check your Inbox in a short while for a confirmation email to check it was really you that just subscribed. If you've already subscribed, we'll keep your subscription but you won't receive a confirmation email this time.