Economics, illustrated: With great data come great responsibilities

A thoughtful piece on the powers and limits of data. While aimed more toward "data journalists", economists and econometricians may do well to take heed of the following advice:

Data is not a force unto itself. Data clearly does not literally create value or change in the world by itself. We talk of data changing the world metonymically – in more or less the same way that we talk of the print press changing the world. Databases do not knock on doors, make phonecalls, push for institutional reform, create new services for citizens, or educate the masses about the inner workings of the labyrinthine bureaucracies that surround us. The value that data can potentially deliver to society is to be realised by human beings who use data to do useful things. The value of these things is the result of the ingenuity, competence and (perhaps above all) hard work of human beings, not something that follows automatically from the mere presence and availability of datasets over the web in a form which permits their reuse.

Data is not a perfect reflection of the world. Public datasets (unsurprisingly) do not give us perfect information about the world. They are representations of the world gathered, generated, selected, arranged, filtered, collated, analysed and corrected for particular purposes – purposes as diverse as public sector accounting, traffic control, weather prediction, urban planning, and policy evaluation. Data is often incomplete, imperfect, inaccurate or outdated. It is more like a shadow cast on the wall, generated by fallible human beings, refracted through layers of bureaucracy and official process. Despite this partiality and imperfection, data generated by public bodies can be the best source of information we have on a given topic and can be augmented with other data sources, documents and external expertise. Rather than taking them at face value or as gospel, datasets may often serve as an indicative springboard, a starting point or a supplementary source for understanding a topic.

Data does not speak for itself. Sometimes items in a database will stand by themselves, and do not require additional context or documentation to help us interpret them – for example, when we consult transport timetables to find out when the next train leaves. But often data will require further research and analysis in order to make sense of it. In many ways official datasets resemble official texts: we need to learn how to read and interpret them critically, to read between the lines, to notice what is absent or omitted, to understand the gravity and implications of different figures, and so on. We should not imagine that anyone can easily understand any dataset, any more than we would think that anyone can easily read any policy document or academic article.

Data is not power. Data may enable more people to scrutinise official activities and transactions through more detailed, data-driven reportage. In principle it might help more people participate in the formulation of more evidence based policy proposals. But the democratisation of information is different from the democratisation of power. Knowing that something is wrong or that there is a better way of doing things is not the same thing as being in a position to fix things or to affect change. For better or for worse flawless arguments and impeccable evidence are usually not sufficient in themselves to affect reform. If you want to change laws, policies or practices it usually helps to have things like implacable advocacy, influential or high profile supporters, positive press attention, hours of hard graft, bucketloads of cash and so on. Being able to see what happens in the corridors of power through public datasets does not mean you can waltz down them and move the furniture around. Open information about government is not the same as open government, participatory government or good government.

Interpreting data is not easy. Furthermore there is a tendency to think that the widespread availability of data and data tools represent a democratisation of the analysis and interpretation of data. With the right tools and techniques, anyone can understand the contents of a dataset, right? Here it is important to distinguish between different orders of activity: while it is easier than ever before to do things with data on computers and on the web (scrape it, visualise it, publish it), this does not necessarily entail that it is easier to know what a given dataset means. Revolutionary content management systems that enable us to search and browse legal documents don't mean that it is easier for us to interpret the law. In this sense it isn't any easier to be a good data journalist than it is to be a good journalist, a good analyst, a good interpreter. Creating a good piece of data journalism or a good data-driven app is often more like an art than a science. Like photography, it involves selection, filtering, framing, composition and emphasis. It involves making sources sing and pursuing truth – and truth often doesn't come easily. Amid all of the services and widgets, libraries and plugins, talks and tutorials, there is no sure-fire technique to doing it well.

Economics, illustrated

Sunday, June 3, 2012

With great data come great responsibilities

No comments:

Post a Comment