We’re living in a time of disruptive technologies evolving at an exponential pace. Today, you can enjoy an Impossible Burger (meat industry disrupted) delivered by Caviar (food delivery disrupted) to your AirBnB (hotel industry disrupted) while you’re on FaceTime (telecommunication industry disrupted) urging your teenager to get back to lessons on Khan Academy (education industry disrupted). And all the while, you’re leaving a trail of digital data points.
So rather than trying to predict what the future will bring, I want to focus on the principles we should use to shape it. What do we want the future to look like?To me, this means two things: making data convenient and making data credible.
What makes data convenient?
Convenient data are easy to find, use, and combine with other data. We take those metrics as our guiding approach for our World Development Indicators, which aims to seamlessly integrate data from various sources to offer users a convenient way to access and use data on a wide range of development topics. Integrating all that data is no easy task, despite our best efforts in automating our data ingestion, updates and aggregation procedures.
That’s because many of us that produce and share data are still lagging behind in terms of making our data convenient. Many database platforms only allow users to download one country or one indicator at a time. Others require data seekers to use complex multi-step query tools in order to download the data that they need. And many organizations still haven’t taken the vitally important step of opening up their data.
One weird trick to make your data convenient: open it up!
When organizations adopt Open Data practices, data become much more convenient and usable. We’ve seen this firsthand with the World Bank’s data assets. For example, since we launched the World Development Indicators as open data six years ago, we’ve seen a 15-fold increase in our unique visitors, currently amounting to over one million visits a month! Search the internet for “poverty rate in Indonesia” or “CO2 emissions in South Africa” and the search results you get are links to authoritative data indicators from the World Bank and other sources. That’s why Open Data should become standard practice for all governments and organizations that generate data.
It doesn’t actually take much to open up data. I’m confident that if an organization puts any of the below suggestions into practice, they’ll greatly increase the potential of their data:
- Provide data in downloadable, machine-readable formats like CSV and XLS, so that your users don’t have to convert PDFs into spreadsheets
- Make data available in bulk downloads
- Offer APIs, which directly query and retrieve data so that your users don’t have to click endlessly through query tools
- Use a standard Creative Commons Attribution Interface
All these suggestions greatly increase the technical interoperability of data, making it easier for people to integrate different types of data to reveal new insights.
What makes data credible? Smart standard setting & transparency
This isn’t always as straightforward as it might sound. To be sure, there are already major, mature standardization efforts in the statistical community. But with the emergence of so much new data, new sources, and new issues that we don’t yet know how to measure, we need to rethink our usual approach to standard setting.
We need to be more agile when it comes to both existing and new data sources. We need to be involved in new data communities and see where critical masses emerge around the data that’s most useful. And most importantly, we shouldn’t standardize too early. It’s not enough for data to be technically interoperable for ease of data integration – they also need to be conceptually interoperable, in terms of developing compatible and consistent standards for different types of data. In the interest of conceptual interoperability, it’s far better to create minimal standards that are flexible and can evolve as new approaches emerge. One good model for this is the HTML standard, which started in the early 90s with a handful of tags, and has since expanded and evolved over time to meet user needs.
Credible data is also transparent. Through clear metadata, detailed documentation, and open algorithms, data portals should facilitate users’ confidence in their data. Without compromising confidentiality, we should make it easy for users to reproduce our data analysis by sharing code and the underlying datasets.
Don’t let the perfect be the enemy of the good
To all of us that produce and share data – let’s ride the tide rather than resist it.
And while I don’t think that “make it convenient, make it credible” will catch on to the extent that other, similar-sounding phrases have in the past, we all have a shared responsibility to figure out how we can better serve our users, that they may better serve the world.