Here are 9 email newsletters about data… I think you’ll like at least 4 of them

fairwarning metrics

Sophie Warnes doesn’t just round up data journalism in her emails, she *does* data journalism *about* her emails

As the first group of MA Data Journalism students prepare to start their course this month, I’ve been recommending a number of email newsletters in the field that they should be following — and I thought I should share it here too.

Here, then, are 9 email newsletters about data — if I’ve missed any please let me know. Continue reading “Here are 9 email newsletters about data… I think you’ll like at least 4 of them”

10 principles for data journalism in its second decade

10 principles Data journalism

In 2007 Bill Kovach and Tom Rosenstiel published The Elements of Journalism. With the concept of ‘journalism’ increasingly challenged by the fact that anyone could now publish to mass audiences, their principles represented a welcome platform-neutral attempt to articulate exactly how journalism could be untangled from the vehicles that carried it and the audiences it commanded.

In this extract from a forthcoming book chapter* I attempt to use Kovach and Rosenstiel’s principles (outlined in part 1 here) as the basis for a set that might form a basis for data journalism as it enters its second and third decades.

Principle 1: Data journalists should strive to interrogate data as a power in its own right

When data journalist Jean-Marc Manach set out to find out how many people had died while migrating to Europe he discovered that no EU member state held any data on migrants’ deaths. As one public official put it, dead migrants “aren’t migrating anymore, so why care?

Similarly, when the BBC sent Freedom of Information requests to mental health trusts about their use of face-down restraint, six replied saying they could not say how often any form of restraint was used — despite being statutorily obliged to “document and review every episode of physical restraint which should include a detailed account of the restraint” under the Mental Health Act 1983.

The collection of data, the definitions used, and the ways that data informs decision making, are all exercises of power in their own right. The availability, accuracy and employment should all be particular focuses for data journalism as we see the expansion of smart cities and wearable technology.

Principle 2: Editorial independence includes technological independence

I wrote in 2013 about the role of coding in ensuring editorial independence, quoting Lawrence Lessig‘s point, made over a decade ago, that code is law:

“Ours is the age of cyberspace. It, too, has a regulator. This regulator, too, threatens liberty. But so obsessed are we with the idea that liberty means “freedom from government” that we don’t even see the regulation in this new space. We therefore don’t see the threat to liberty that this regulation presents.

“This regulator is code—the software and hardware that make cyberspace as it is. This code, or architecture, sets the terms on which life in cyberspace is experienced. It determines how easy it is to protect privacy, or how easy it is to censor speech. It determines whether access to information is general or whether information is zoned. It affects who sees what, or what is monitored. In a host of ways that one cannot begin to see unless one begins to understand the nature of this code, the code of cyberspace regulates.” (Lessig 2006)

The independence of the journalist is traditionally portrayed as possessing the power to resist pressure from our sources, our bosses and business models, and the government and law enforcement. But in a networked age it will also mean independence from the biases inherent in the tools that we use.

From the content management systems that we use, to the mobile devices that record our every move, independence in the 21st century will be increasingly facilitated by being able to ‘hack’ our tools or build our own.

Code affects what information you can access, your ability to verify it, your ability to protect sources — and your ability to empower them. Finally, code affects your ability to engage users.

Code is a key infrastructure that we work in as journalists: if we understand it, we can move across it much more effectively. If it is invisible to us, we cannot adapt it, we cannot scrutinise it. We are, in short, subject to it.

Principle 3: We should strive for objectivity not just in the sources and language that we use, but also the way that we design our tools

europe from moscow

Mapping tools assume a default point of view. Image: Time Magazine via Newberry Library via Jake Ptacek

In data journalism right now we are at a crucial stage: the era during which we move from making stories and tools for other people, to making our own tools.

As John Culkin, in writing about Marshall McLuhan, said:

“We shape our tools, and thereafter they shape us”.

The values which we embed in those tools, the truths we take for granted, will have implications beyond our own generation.

The work of Lawrence Lessig and Nicholas Diakopoulos highlights the role that code plays in shaping the public lives that we can lead; we need to apply the same scrutiny to our own processes.

When we build tools on maps do we embed the prejudices that have been identified by critical cartographers?

Do we seek objectivity in the visual language we use as well as the words that we choose?

But it is not just the tools which will shape our practice: the reorganisation of newsrooms and the creation of data desks and the data journalist’s routine will also begin to draw the borders of what is considered normal in – and what is considered outside of – the role of the data journalist.

Uskali and Kuutti, for example, already identify at least three different models for organising data journalism work practices: data desks, flexible data projects, and the entrepreneur or sub-contractor model. To what extent these models circumscribe or provide opportunities for new ways of doing journalism bears some reflection.

If we are to augment our journalism, we must do so critically.

Principle 4: Impartiality means not relying only on stories where data exists and is easy to obtain

The increasing abundance of data brings with it a new danger: that we do not look beyond what is already accessible, or that we give up too easily if a story does not seem practical.

Just as the expansion of the PR industry in the 20th century led to accusations of ‘churnalism’ in the media, the expansion of data in the 21st century risks leading to ‘data churnalism’ instead of data journalism, including the use of automation and dashboards as a way of dealing with those accessible sources.

Principle 5: We should strive to give a voice to those who are voiceless in data by seeking to create or open up data which would do so

Head icons

When The Guardian’s ‘The Counted’ project sought to report on people killed by police in the US, it was literally seeking to ‘give a voice to the voiceless’ — because those people were dead; they could not speak.

The Bureau of Investigative Journalism‘s Naming the Dead project had a similar objective: tracking and investigating US covert drone strikes since 2011 and seeking to identify those killed.

Neither is an example of data journalism that uses coding: the skills are as basic as keeping a record of every report you can find. And yet this basic process has an important role at the heart of modern journalism: digitising that which did not exist in digital form before: moving from zeroes to ones. You can find more examples in this post about the approach in 2015:

“Too often data journalism is described in ways that focus on the technical act of working with existing data. But to be voiceless often means that no data exists about your experience.”

Principle 6: We retain editorial responsibility for context and breadth of coverage where we provide personalisation

If journalism must provide a forum for public criticism and compromise, what role does personalisation — which gives each person a different experience of the story — play in that?

Some would argue that it contributes to ‘filter bubbles’ whereby people are unaware of the experiences and opinions of people outside of their social circle. But it can also bring people in to stories that they would otherwise not read at all, because those stories would otherwise have no relevance to their lives.

As data journalists, then, we have a responsibility to consider the impact of personalisation and interactivity both in making news relevant to readers, and providing insights into other dimensions of the same story which might not be so directly relevant.

This, of course, has always been journalism’s skill: after all, human interest stories are the ‘universal’ hook that often draws people in to the significant and important.

Principle 7. We should strive to keep the significant interesting and relevant by seeking to find and tell the human story that the data shines a spotlight on

For the same reason, we should ensure that our stories are not merely about numbers, but people. I always tell my MA Data Journalism students that a good story should do two things: tell us why we should care, and tell us why it matters.

Data helps us to establish why a story matters: it connects one person’s story to 100 others like it; without data, a bad experience is merely an anecdote. But without a human story, data becomes just another statistic.

Principle 8. The algorithms in our work – both human and computational – should be open to scrutiny, and iteration

The more that journalism becomes augmented by automation, or facilitated by scripts, the more that we should consider being open to public scrutiny.

If we are unable to explain how we arrived at a particular result, that undermines the credibility of the conclusion.

Diakopoulos and Koliska, who have explored algorithmic transparency in the news media, conclude that it is an area much in need of research, development and experimentation:

“There are aspects of transparency information that are irrelevant to an immediate individual user context, but which are nonetheless of importance in media accountability for a broad public such as fair and uncensored access to information, bias in attention patterns, and other aggregate metrics of, for instance, error rates. In other words, some factors may have bearing on an individual whereas others have import for a larger public. Different disclosure mechanisms, such as periodic reports by ombudspeople may be more appropriate for factors like benchmarks, error analysis, or the methodology of data collection and processing, since they may not be of interest to or even comprehensible for many users yet demanded by those who value an expert account and explanation.”

Principle 9. Sharing our code also allows us to work more efficiently and raise standards

buzzfeed github

It has often been said that transparency is the new objectivity in this new world of limitless publishing. This both recognises that while true objectivity does not exist transparency can help establish what steps we have taken towards coming as close as we can to it.

The AP Stylebook‘s section on data journalism has formally recognised this with its reference to reproducible analysis:

“Providing readers with a roadmap to replicate the analysis we’ve done is an essential element of transparency in our reporting. We can accomplish this transparency in many ways, depending on the data set and the story”

But what is transparency for data journalists? Jennifer Stark and Nicholas Diakopoulos outline principles from scientific research that can be adapted – specifically reproducibility and replicability.

Reproducibility involves making code and data available so a user can rerun the original analysis on the original data. “This is the most basic requirement for checking code and verifying results”

Replicability,  on  the  other  hand,  “requires  achieving  the  same outcome  with independent data  collection,  code  and  analysis.  If the same  outcome  can  be  achieved  with  a  different  sample, experimenters  and analysis software,  then  it  is  more  likely  to  be true.”

Currently the code-sharing site GitHub is the place where many data teams share their code so that others can reproduce their analysis. It is incredible to look across the GitHub repositories of FiveThirtyEight or BuzzFeed and understand how the journalism was done. It also acts as a way to train and attract future talent into the industry, either formally as employees, or informally as contributors.

Principle 10. We should seek to empower citizens to exercise their rights and responsibilities

new york times you draw it

The New York Times You Draw It challenges users to take an active role

The final principle mirrors Kovach and Rosenstiel’s: the obligation on the public to take some responsibility for journalism too. And it is here, perhaps, where data journalism has the most significant role to play.

Because where Kovach and Rosenstiel put the onus on the public, I believe that data journalism is well positioned to do more, and to actively empower that public to exercise those rights and responsibilities.

A New York Times interactive which invites the user to draw a line chart before revealing how close they were to the true trend is precisely the sort of journalism which helps users engage with their own role in negotiating information.

A tool which allows you to write to your local representative, or to submit a Freedom of Information request, is one which positions the reader not as a passive consumer of news, but as an active participant in the world that they are reading about.

In print and on air we could only arm our audiences with information, and hope that they use it wisely. Online we can do much more — and we’ve only just begun.

*You can read all three extended extracts from the book chapter under the tag ‘Next Wave of Data Journalism’ here.

Filed under: online journalism Tagged: algorithms, AP Stylebook, data churnalism, Jean-Marc Manach, Jennifer Stark, John Culkin, lawrence lessig, Michael Koliska, next wave of data journalism, Nicholas Diakopoulos, objectivity

Computational thinking and the next wave of data journalism

In this second extract from a forthcoming book chapter I look at the role that computational thinking is likely to play in the next wave of data journalism — and the need to problematise that. You can read the first part of this series here.

Computational thinking is the process of logical problem solving that allows us to break down challenges into manageable chunks. It is ‘computational’ not only because it is logical in the same way that a computer is, but also because this allows us to turn to computer power to solve it.

As Jeannette M. Wing puts it:

“To reading, writing, and arithmetic, we should add computational thinking to every child’s analytical ability. Just as the printing press facilitated the spread of the three Rs, what is appropriately incestuous about this vision is that computing and computers facilitate the spread of computational thinking.”

This process is at the heart of a data journalist’s work: it is what allows the data journalist to solve the problems that make up so much of modern journalism, and to be able to do so with the speed and accuracy that news processes demand.

It is, in Wing’s words, “conceptualizing, not programming” and “a way that humans, not computers, think.”

“Computers are dull and boring; humans are clever and imaginative. We humans make computers exciting. Equipped with computing devices, we use our cleverness to tackle problems we would not dare take on before the age of computing and build systems with functionality limited only by our imaginations” (Wing 2006)

And it is this – not coding, or spreadsheets, or visualisation – that I believe distinguishes the next wave of journalists. Skills of decomposition (breaking down into parts), pattern recognition, abstraction and algorithm building that schoolchildren are being taught right now. Imagine what mass computational literacy will do to the news industry.

Nicholas Diakopoulos‘s work on the investigation of algorithms is just one example of computational thinking in practice. In his Tow Center report on algorithmic accountability he outlines an approach to reverse-engineer the ‘black boxes’ that shape how we experience an increasingly digitised world:

“Algorithms must always have an input and output; the black box actually has two little openings. We can take advantage of those inputs and outputs to reverse engineer what’s going on inside. If you vary the inputs in enough ways and pay close attention to the outputs, you can start piecing together a theory, or at least a story, of how the algorithm works, including how it transforms each input into an output, and what kinds of inputs it’s using. We don’t necessarily need to understand the code of the algorithm to start surmising something about how the algorithm works in practice.”

Problematising computationality


Infoamazonia is one of a number of projects seeking to make environmental crime ‘visible’. Image from Geojournalism

But the next wave of data journalism cannot just solve the new technical problems that the industry faces: it must also “problematise computationality”, to use the words of David M. Berry:

“So that we are able to think critically about how knowledge in the 21st century is transformed into information through computational techniques, particularly within software.”

His argument relates to the role of the modern university in a digital society, but the same arguments can be made about journalism’s role too:

“The digital assemblages that are now being built … provide destablising amounts of knowledge and information that lack the regulating force of philosophy — which, Kant argued, ensures that institutions remain rational.

“… There no longer seems to be the professor who tells you what you should be looking up and the ‘three arguments in favour of it’ and the ‘three arguments against it’.”

This is not to argue for the reintroduction of gatekeepers, but to highlight instead that information is not neutral, and it is the role of the journalist – just as it is the role of the educator – to put that information into context.

Crime mapping is one particularly good example of this. What can be more straightforward than placing crimes on a map? As Theo Kindynis writes of crime mapping, however:

“It is increasingly the case that it simply does not make sense to think about certain types of crime in terms of our conventional notions of space. Cybercrime, white-collar financial crime, transnational terrorism, fraud and identity theft all have very real local (and global) consequences, yet ‘take place’ within, through or across the ‘space of flows’ (Castells 1996). Such a-spatial or inter-spatial crime is invariably omitted from conventional crime maps.” (Kindynis 2014)

All this serves to provide some shape to the landscape that we are approaching. To navigate it we perhaps need some more specific principles of our own to help.

In the third and final part of this series, then, I want to attempt to build on Kovach and Rosenstiel’s work with principles which might form a basis for data journalism as it enters its second and third decades.

You can read all three extracts under the tag ‘Next Wave of Data Journalism’ here.

Filed under: online journalism Tagged: algorithms, computational thinking, data journalism, David M. Berry, Jeannette M. Wing, next wave of data journalism, Nicholas Diakopoulos, Theo Kindynis

The next wave of data journalism?

In the first of three expanded extracts from a forthcoming book chapter on ‘The next wave of data journalism’ I outline some of the ways that data journalism is reinventing itself, and adapting for a world which is rapidly changing again. Where networked communications and processing power were key in the 2000s, automation and AI are becoming key in the decade to come. And just as data journalism raised the bar for journalism as a whole, the bar is about to be raised for data journalism itself.

Data journalism isn’t doing enough. Now into its second decade, the noughties-era technologies that it was built on – networked access to information and vastly improving visualisation capabilities – are now taken for granted, just as the ‘computer assisted’ part of its antecedent Computer Assisted Reporting was.

In just ten years data journalism has settled down into familiar practices and genres, from the interactive map and giant infographics to the quick turnaround “Who comes bottom in the latest dataset” write-up. It’s a sure sign of maturity when press officers are sending you data journalism-based media releases.

Now we need to move forward. And the good news is: there are plenty of places to go.

Looking back to look forward

Phillip Meyer's reporting

Philip Meyer’s reporting laid the foundation for CAR

In order to look forward it is often useful to look back: in any history of data journalism you will read that it came partly out of the Computer Assisted Reporting (CAR) tradition that emerged in the US in the late 1960s.

CAR saw journalists using spreadsheet and database software to analyse datasets, but it also had important political and cultural dimensions too: firstly, the introduction of a Freedom of Information Act in the US which made it possible to access more data than before; and secondly, the spread of social science methods into politics and journalism, pioneered by seminal CAR figure Philip Meyer.

Data journalism, like CAR, had technological, political and cultural dimensions too. Where CAR had spreadsheets and databases, data journalism had APIs and datavis tools; where CAR had Freedom of Information, data journalism had a global open data movement; and where CAR acted as a trojan horse that brough social science methods into the newsroom, data journalism has brought ‘hacker’ culture into publishing.

Much of the credit for the birth of data journalism lies outside of the news industry: often overlooked in histories of the form is the work of civic coders and information activists (in particular MySociety which was opening up political data and working with news organisations well before the term data journalism was coined), and technology companies (the APIs and tools of Yahoo! for example formed the basis of much of data journalism’s early experiments).

The early data journalists were self-created, but as news organisations formalised data journalism roles and teams, data journalism newswork has been formalised and routinised too.

So where do we look for data journalism’s next wave of change?

Look outside news organisations once again and you see change in two areas in particular: on the technical side, an increasing use of automation, from algorithms and artificial intelligence (AI) to bots and the internet of things.

On the political side, a retreat from open data and transparency while non-governmental organisations take on an increasingly state-like role in policing citizens’ behaviour.

What is data journalism for?


Datavis can be seen as “Striving to keep the significant interesting and relevant”

Data journalists will often tell you that the key part of data journalism is the journalism bit: we are not just analysing data but finding and telling important stories in that. But journalism isn’t just about stories, either. Kovach and Rosenstiel, in their excellent book Principles of Journalism, outline 10 principles which are always important to return to:

  • Journalism’s first obligation is to the truth
  • Its first loyalty is to citizens
  • Its essence is a discipline of verification
  • Its practitioners must maintain an independence from those they cover
  • It must serve as an independent monitor of power
  • It must provide a forum for public criticism and compromise
  • It must strive to keep the significant interesting and relevant
  • It must keep the news comprehensive and proportional
  • Its practitioners must be allowed to exercise their personal conscience
  • Citizens, too, have rights and responsibilities when it comes to the news

Some of these can be related to data journalism relatively easily: journalism’s first obligation to the truth, for example, appears to be particularly well served by an ability to access and analyse data.

Striving to keep the significant interesting and relevant? Visualisation and interactivity are great examples of how data journalism has been able to do just that for even the dryest subjects.

But an attraction to those more obvious benefits of data journalism can distract us from the demands of the other principles.

Is data journalism “a discipline of verification”, or do we attribute too much credibility to data? Cleaning data, and seeking further sources that can independently confirm what the data appears to tell us are just two processes that should be just as central as being able to generate a bar chart.

Some of the other principles become more interesting when you begin to look at developments that are set to impact our practice in the coming decades…

Rise of the robots

The rise of ‘robot journalism‘ – the use of automated scripts to analyse data and generate hundreds of news reports that would be impossible for individual journalists to write – is one to keep a particular eye on.

Aside from the more everyday opportunities that automation offers for reporting on amateur sports or geological events, automation also offers an opportunity to better “serve as an independent monitor of power”.

Lainna Fader, Engagement Editor at New York Magazine, for example, highlights the way that bots are useful “for making value systems apparent, revealing obfuscated information, and amplifying the visibility of marginalized topics or communities.”

By tweeting every time anonymous sources are used in the New York Times the Twitter bot @NYTanon serves as a watchdog on the watchdogs (Lokot and Diakopoulos 2015).

But is robot journalism a “discipline of verification” (another of Kovach and Rosenstiel’s principles)? Well, that all boils down to the programming: in 2015 Matt Carlson talked about the rise of new roles of “meta-writer” or “metajournalist” to “facilitate” automated stories using methods from data entry to narrative construction and volunteer management.  And by the end of 2016 observers were talking about augmented journalism‘: the idea of using computational techniques to assist in your news reporting.

The concept of ‘augmented journalism’ is perhaps a defensive response to the initial examples of robot journalism: with journalists feeling under threat, the implied assurance is that robots would free up time for reporters to do the more interesting work.

What has remained unspoken, however, is that in order for this to happen, journalists need to be willing — and able — to shift their focus from routine, low-skilled processes to workflows involving high levels of technical skill, critical abilities — and computational thinking.

But more than a decade on from Adrian Holovaty’s seminal post “A fundamental way newspaper sites need to change”, there is very little evidence of this being seriously addressed in journalism training or newsroom design. Instead, computational thinking is being taught earlier, to teenagers and younger children at school.

Some of those may, in decades to come, get a chance to reshape the newsroom themselves. In the next part of this series, then, I look at how computational thinking is likely to play a role in the next wave of data journalism — and the need to problematise and challenge it at the same time.

Filed under: online journalism Tagged: data journalism, extract, Kovach and Rosenstiel, Lainna Fader, Matt Carlson, philip meyer

LiveStories raises $10 million to help you access public health and census data


LiveStories, which provides software that simplifies access to civic data on poverty, health, economics, and more, today announced that it has raised $10 million in funding. Ignition Partners led the round, with participation from returning investors True Ventures and Founders Co-Op.

The Seattle-based startup sources data from federal, state, and local governments, including The Bureau of Labor Statistics, the U.S. Census, and the Centers for Disease Prevention and Control.

“The civic data workflow is fragmented across multiple tools and vendors,” wrote LiveStories founder and CEO Adnan Mahmud, in an email to VentureBeat. “For example, you might use Google to find the data, Excel to clean it up, Tableau to explore it, and Word to create a static report.”

According to Mahmud, LiveStories’ software allows customers to find and communicate civic data in a more interactive way — across charts, videos, and images. “Our platform automatically visualizes the data, down to city and county localities,” wrote Mahmud. The data can then be shared on social media networks like Facebook and Twitter.

LiveStories claims to have more than 120 customers, which include LA County, CDPH, San Diego County, UCLA, and the Gates Foundation.

Today’s funding will be used to further develop the product and increase sales and marketing. Founded in 2015, LiveStories has raised a total of $14 million and currently has 20 employees.

Sign up for Funding Daily: Get the latest news in your inbox every weekday.

Data journalism on radio, audio and podcasts

In a previous post I talked about how data journalism stories are told in different ways on TV and in online video. I promised I’d do the same for audio and radio — so here it is: examples from my MA in Data Journalism to give you ideas for telling data stories using audio.

this american life

As with any audio post, This American Life features heavily: not only is the programme one of the best examples of audio journalism around — it also often involves data too.

Right To Remain Silent is one particularly good example, because it’s about bad data: specifically. police who manipulated official statistics.

You might also listen to Choosing Wrong, which includes a section about polling.

Another favourite of mine is an audio story by The Economist about the prostitution industry, based on data scraped from sex trade websites: More bang for your buck (there are even worse puns in the charts).

David Rhodes, a BBC data journalist, has a range of stories on his Audioboom account, including pieces on Radio 4, Radio 5 Live, and this piece from the excellent factchecking radio programme, More or Less.

In podcasting this episode of The Allusionist tells a story about an experiment with data and dating.

Finally, I have to include an episode of Radiolab, one of my favourite podcasts. Shots Fired — which is split into two episodes — employs the common approach of interviewing the journalist who undertook a data-driven investigation (in other words, hooking the story on the journalist’s ‘quest’). It’s embedded below. For a geekier trip, try their podcast about Benford’s Law.

Podcasts about data journalism

There are also many great podcasts about data itself — one of my former students compiled a list for GIJN:

If you’ve heard any other examples of data stories being told through audio, please let me know — I’m always on the lookout for more!

Filed under: online journalism Tagged: audio, BBC, data journalism, David Rhodes, podcasting, radiolab, The Allusionist, The Economist, This American Life

Here’s the thinking behind my new MA in Data Journalism


Cogs image by Stuart Madeley

A few weeks ago I announced that I was launching a new MA in Data Journalism, and promised that I would write more about the thinking behind it. Here, then, are some of the key ideas underpinning the new course — from coding and storytelling to security and relationships with industry — and how they have informed its development.

1. Not just data journalism, but data storytelling: video, audio, text, visuals, interactivity — and UX

In designing the course I wanted to ensure that students thought about how to tell their data stories across all media — not just text and datavis.

I created a central Narrative module which gives students the technical and editorial skills to report a story across multiple platforms and media. That includes video and audio, techniques of longform immersive storytelling, social media-native data journalism, and visual journalism techniques (“Overview, zoom and filter, then details on demand“).

The module also looks at how to employ narrative techniques in interactivity too — after all, what is the “user journey” in UX, but another narrative?

2. Coding in a journalistic context, not a computing class

It’s no surprise that I’ve decided to make coding-as-journalism a central part of the MA in Data Journalism.

We do not send journalism students to the Law faculty to learn Media Law, or to the English faculty to learn about subbing and style, so I wanted to ensure students learned coding in a journalistic context too.

Doing so means students get editorial, ethical and legal guidance in class alongside technical support. Hackdays, Hacks/Hackers meetups and other collaborations with computing and other faculties provide opportunities for cross-disciplinary innovation around shared objectives.

Equally importantly, teaching this way makes for a more efficient and pedagogically effective experience for the student: being taught how to make a for loop or generate a range of numbers within the context of writing a scraper or an interactive story makes for a much more rewarding learning experience than learning the same skill in an unrelated context (indeed, it’s why I wrote my books on scraping and spreadsheets).

A final reason for keeping coding teaching in-house is that journalists are ultimately judged on their reporting over their coding, and I felt data journalism teaching and assessment should reflect this: the point of the modules is not to merely demonstrate technical excellence, but rather to demonstrate how technical skills can be used to facilitate journalistic excellence.

Striking, original stories made possible through creative application of coding and other data skills will impress potential employers much more than something technically impressive but journalistically basic.

3. Three languages — and computational thinking

programming - the IT crowd

Teaching coding this way means I can introduce students to at least three different programming languages in at least three different modules across the course: R and JavaScript in the first semester, then Python for advanced scraping and other investigative techniques such as network analysis in Specialist Reporting, Investigations and Coding.

SQL, regex, command line and Git are all covered too.

Teaching 3 languages allows students to learn the underlying techniques of coding which are language-neutral: being able to identify an editorial challenge using computational thinking, and then find the relevant libraries and examples which will help them to solve it (and understand the documentation).

It also means they can adapt to a newsroom which prefers one or more of those particular languages, or communicate with developers who use different languages.

4. (Re)inventing the data journalism workflow

telegraph newsroom

Newsroom image by Rob Enslin

Most newsrooms have gone through some form of reorganisastion in the last decade, and are likely to do so again in the next.

The introduction of data journalists or interactive project teams has often been part of that — but we still don’t know the best way to fit those data journalists into the wider organisation, or organise those teams.

We also need to be thinking about how the integration of data journalism and its workflows affect the journalism itself. To list just five questions facing us:

  • To what extent are data journalists choosing to report on certain subjects over others because the data is more readily available?
  • How does a journalist in a broadcast organisation work differently from one in a print or online-only publisher?
  • When developer time is expensive and a bottleneck to innovation, how does that shape what can be done editorially?
  • What automation can we build into our workflows — and what issues does that raise?
  • And of course, how does the CMS limit us editorially — and how do journalists get around those limitations?

I wanted to make sure that they had an opportunity to explore these questions in practice as they organise their own newsrooms alongside students on the MA in Multiplatform and Mobile Journalism.

When in their later career they begin to form their own data units in media organisations, or are invited to contribute to yet another reorganisation, it’s important that they are able to make informed decisions.

5. Media law and ethics — and technical defence

I wanted the law element on the MA Data Journalism to not only address regulatory frameworks, but also specific considerations when dealing with data. That includes information security, the ethics around issues such as personalisation and mass data gathering; legal considerations such as data protection; and the use of laws such as Freedom Of Information.

It is one of the pecularities of our age that it is no longer enough for a journalist to be able to mount a legal defence to protect their information and their sources; they must now be able to mount a technical defence as well.

6. Specialist and investigative skills alongside technical skills

MA student Carla Pedret's final project was shortlisted for the Data Journalism Awards

MA student Carla Pedret’s final project was shortlisted for the Data Journalism Awards

Most data journalists operate much as specialist or investigative reporters do: focusing on a particular field and trying to understand how information is collected and stored within that.

I wanted students to have an opportunity to develop that specialist knowledge, and exercise data journalism skills alongside other important techniques such as analysing company accounts, interviewing, and understanding how a system works.

After all, a data journalist can only work with the data they have access to. And having good data often means knowing where to look, who to ask, and how to understand the context in which it has been gathered. This part of the course also provides an opportunity to create striking original reporting which builds the student’s reputation.

7. Working with industry — and communities of practice

Many current data journalists arrived in their roles through internal routes having worked in a freelance role on particular projects that needed data skills and those roles either being extended or made permanent.

I regularly field calls from media organisations asking for students with data journalism skills to help with a story, and so for the MA in Data Journalism I worked to formalise those relationships in a range of ways.

These relationships cover local and national newspapers, magazines, broadcasters and online-only publishers both in the UK and internationally.

It means that students have access to a range of opportunities to work on industry projects, and can easily seek out potential clients whose problems they can take on for the module addressing enterprise and entrepreneurship. The idea is not just to create opportunities for students — but also hopefully building capacity in the industry itself.

Just as important as industry are the wider communities engaged in data journalism (more here): a lot of research has been done into the intersections between journalism culture and hacker culture, and I believe that it is important that data journalism students engage with the various online networks surrounding data and coding.

Those are the communities which will support the student long after graduation, as new tools and techniques come along — and new stories.

I’d welcome any thoughts on the course and other elements which should be included.

Filed under: online journalism Tagged: automation, communities of practice, ethics, investigative journalism, javascript, MA Data Journalism, narrative, newsrooms, Python, R, scraping, security, ux

Helping journalists tell stories with data

The Data Journalism Handbook, published in 2011, is considered the guidebook for telling stories with data. To ensure that journalists are up to speed on the latest data journalism practices, the Google News Lab is partnering with the the European Journalism Centre to launch a new version of the Data Journalism Handbook, which will be published in four languages next year.

The original handbook was born at a 48-hour workshop at MozFest 2011 in London, and became an international, collaborative effort involving dozens of data journalism’s leading advocates and best practitioners.

Over the past three years, the handbook has been digitally downloaded 150,000 times, and almost a million people have accessed the online version. But the world is changing, and so are the ways we use data to tell news stories. So this project is one of a series of initiatives by the data team at the Google News Lab to support data journalists and help them understand how to best incorporate technology into their work—you can find out more on our site. We’re also proud to partner with the European Journalism Centre on their mission to connect journalists with new ideas through initiatives like the News Impact Summits and the News Impact Academy.

On July 31, we will open a call for contributions. Later this year, around 50 authors and experts will join a Handbook Hack to create and edit content for the new edition. And you won’t have to wait long to start reading the new chapters: we’ll make them available online as they are completed. Check out the official site for the latest updates.

Data journalism in broadcast news and video: 27 examples to inspire and educate

channel 4 network diagram

This network diagram comes from a Channel 4 News story

The best-known examples of data journalism tend to be based around text and visuals — but it’s harder to find data journalism in video and audio. Ahead of the launch of my new MA in Data Journalism I thought I would share my list of the examples of video data journalism that I use with students in exploring data storytelling across multiple platforms. If you have others, I’d love to hear about them.

FOI stories in broadcast journalism

victoria derbyshire gif

Freedom of Information stories are one of the most common situations when broadcasters will have to deal with more in-depth data. These are often brought to life by through case studies and interviewing experts.

In 2015, for example, a former and then-current MA student worked with the BBC’s Victoria Derbyshire programme on FOI responses from 42 police forces relating to violence in schools. The online version of the story included an interview with a former teacher affected by the issue (captured in the gif above).

Other British examples include this ITV story on mental health trusts cutting beds, and this Channel 4 Dispatches piece on benefit sanctions. And I keep a list of other FOI-based stories by the BBC here.

In Canada Fifth Estate’s Rate My Hospital investigation in 2013 featured a number of case study and expert video clips online, this time more presenter-led, while also in Canada this Global News story on pit bull attacks uses charts and tables online, but vox pops and archive clips in the (embedded) broadcast treatment. You can also watch broadcast treatments of stories by their data journalist Patrick Cain into car testing and problem gambling.

Another data journalist in a broadcast organisation is Tisha Thompson at NBCUniversal. Her examples include “collecting rape statistics when the military refuses to hand them over” (more here); government employees accused of stealing the beer they’re supposed to be delivering (more here). (Tisha says this is “Why you should make your own database, especially when the government doesn’t do it”); water quality in Virginia and Maryland; high-end luxury and fashion brands on a list of government seizures; and potholes.

Striking statistics

Hans Rosling, who died earlier this year, did much to popularise the use of statistics and data visualisation. His engaging presentation style led BBC4 to commission a series on “The Joy of Stats“. Here’s one of the highlights:

Broadcast data journalism by students

Karl Idsvoog at Kent State University shared a number of examples of his students producing video reports on their data journalism projects, including pieces on university marketing budgets, free cars for coaches, high school concussions, and athletes missing class (shown above). They’re all good examples of data stories that can be found on your doorstep.

Network analysis in video

Network analysis — analysing relationships between actors in a story — is becoming more and more widely used. Here are a couple of examples where a broadcaster has used it: first, the BBC’s Newsnight leans on a galactic metaphor…

…And second, Channel 4 News uses a network to illustrate the complex story of Rangers Football Club’s troublesome finances:

The data isn’t on screen — but it’s behind the story

One of the reasons it’s not always easy to think of good examples of data journalism in video and audio is because the data itself is hidden. Channel 4’s investigative programme Dispatches often features investigations where data analysis is involved, but it’s not always obvious in the programme itself.

Britain’s Hidden Child Abuse – shown below – involved compiling spreadsheets to demonstrate the scale of the problem, which also helped one reporter to identify recurring reasons why people did not involve the police authorities.

Those spreadsheets were also crucial in convincing the lawyers that they could defend any legal action.

Web-native video

Data video journalism doesn’t have to be made for broadcast. Many of the stories that I’ve worked on in the BBC England data unit have included a video clip. This investigation we did into library cuts includes a caption-led video on how one prominent library has been affected by the cuts.

Across social media the BBC also used a short clip to illustrate some of the key statistics from the story:

As an aside, many radio stations reported on the story by interviewing librarian Lauren Smith, and well known authors.

This story on the impact of a government scheme leads on a video clip which includes interviews with people who used the scheme, and this investigation into midwife led units also led on a video with someone who, like one in four patients, had to be transferred to a consultant led unit. This music festival data story’s lead video goes from a gif-style stop motion to expert interviews.

And if you’re doing a data story involving animals, there really has to be video too.

Germany’s public service broadcaster Bayerischen Rundfunk produces data journalism including the example below…

…and Swiss broadcaster SRF has an impressive data operation too.

Can you add any?

These are just some of the examples I’ve come across in video and broadcast media (I’ll look at audio in a separate post). I’m always on the look-out for new examples, so please let me know if you’ve seen others.

Filed under: online journalism Tagged: Bayerischen Rundfunk, BBC, broadcast, data journalism, dispatches, Fifth Estate, MA Data Journalism, NBC Universal, Patrick Cain, SRF, Tisha Thompson

The 2nd edition of Scraping for Journalists is now live

Scraping for Journalists

When I began publishing Scraping for Journalists in 2012, one of the reasons for choosing to publish online was the ability to publish chapters as I wrote them, and update the book in response to readers’ feedback. The book was finally ‘finished’ in 2013 — but earlier this year I decided to go through it from cover to cover and update everything.

The result — a ‘second edition’ of Scraping for Journalists — is now live. Those who bought the first edition on Leanpub will already have access to this version.

The second edition includes new scrapers for different websites, and a new chapter on scraping APIs and handling JSON.

As always, I’ll be continuing to update the book, including any examples from readers (if you’ve used the techniques in the book for a story, I’d love to know about it).

Filed under: online journalism Tagged: ebook, Scraping for Journalists

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑