Spotlight on data journalism

Making our data journalism stand out on social media

Here’s some we prepared earlier

The stories produced by The Economist’s data team attract a lot of readers. Some of the team’s most popular pieces include our own glass ceiling index and a daily chart about the most dangerous cities in the world. It didn’t come as a surprise that, when we asked our readers what content they wanted to see more of, they said data journalism. So we decided to take two main steps to meet this demand. Continue reading “Spotlight on data journalism”

Text-as-data journalism? Highlights from a decade of SOTU speech coverage

January 2012: The National Post’s graphics team analyzes keywords used in State of the Union addresses by presidents Bush and Obama / Image: © Richard Johnson/The National Post

In a guest post for OJB, Barbara Maseda looks at how the media has used text-as-data to cover State of the Union addresses over the last decade. Continue reading “Text-as-data journalism? Highlights from a decade of SOTU speech coverage”

All my data journalism ebooks are $5 or less this Christmas


The prices of my 3 data journalism ebooks — Data Journalism Heist, Finding Stories in Spreadsheets and Scraping for Journalists — have been cut to $5 on Leanpub in the lead up to Christmas. And if you want to get all 3, you can also get the data journalism books bundle on Leanpub for more than half price over the same period, at $13. Get them while it lasts!

Filed under: online journalism Tagged: books, data journalism, Data Journalism Heist, Finding Stories In Spreadsheets, sale, Scraping for Journalists  

Announcing a part time PGCert in Data Journalism


Earlier this year I announced a new MA in Data Journalism. Now I am announcing a version of the course for those who wish to study a shorter, part time version of the course.

The PGCert in Data Journalism takes place over 8 months and includes 3 modules from the full MA:

  • Data Journalism;
  • Law, Regulation and Institutions (including security); and
  • Specialist Journalism, Investigations and Coding

Continue reading “Announcing a part time PGCert in Data Journalism”

How one Norwegian data team keeps track of their data journalism projects

In a special guest post Anders Eriksen from the #bord4 editorial development and data journalism team at Norwegian news website Bergens Tidende talks about how they manage large data projects.

Do you really know how you ended up with those results after analyzing the data from Public Source?

Well, often we did not. This is what we knew:

  • We had downloaded some data in Excel format.
  • We did some magic cleaning of the data in Excel.
  • We did some manual alterations of wrong or wrongly formatted data.
  • We sorted, grouped, pivoted, and eureka! We had a story!

Then we got a new and updated batch of the same data. Or the editor wanted to check how we ended up with those numbers, that story. Continue reading “How one Norwegian data team keeps track of their data journalism projects”

Here are 9 email newsletters about data… I think you’ll like at least 4 of them

fairwarning metrics

Sophie Warnes doesn’t just round up data journalism in her emails, she *does* data journalism *about* her emails

As the first group of MA Data Journalism students prepare to start their course this month, I’ve been recommending a number of email newsletters in the field that they should be following — and I thought I should share it here too.

Here, then, are 9 email newsletters about data — if I’ve missed any please let me know. Continue reading “Here are 9 email newsletters about data… I think you’ll like at least 4 of them”

Computational thinking and the next wave of data journalism

In this second extract from a forthcoming book chapter I look at the role that computational thinking is likely to play in the next wave of data journalism — and the need to problematise that. You can read the first part of this series here.

Computational thinking is the process of logical problem solving that allows us to break down challenges into manageable chunks. It is ‘computational’ not only because it is logical in the same way that a computer is, but also because this allows us to turn to computer power to solve it.

As Jeannette M. Wing puts it:

“To reading, writing, and arithmetic, we should add computational thinking to every child’s analytical ability. Just as the printing press facilitated the spread of the three Rs, what is appropriately incestuous about this vision is that computing and computers facilitate the spread of computational thinking.”

This process is at the heart of a data journalist’s work: it is what allows the data journalist to solve the problems that make up so much of modern journalism, and to be able to do so with the speed and accuracy that news processes demand.

It is, in Wing’s words, “conceptualizing, not programming” and “a way that humans, not computers, think.”

“Computers are dull and boring; humans are clever and imaginative. We humans make computers exciting. Equipped with computing devices, we use our cleverness to tackle problems we would not dare take on before the age of computing and build systems with functionality limited only by our imaginations” (Wing 2006)

And it is this – not coding, or spreadsheets, or visualisation – that I believe distinguishes the next wave of journalists. Skills of decomposition (breaking down into parts), pattern recognition, abstraction and algorithm building that schoolchildren are being taught right now. Imagine what mass computational literacy will do to the news industry.

Nicholas Diakopoulos‘s work on the investigation of algorithms is just one example of computational thinking in practice. In his Tow Center report on algorithmic accountability he outlines an approach to reverse-engineer the ‘black boxes’ that shape how we experience an increasingly digitised world:

“Algorithms must always have an input and output; the black box actually has two little openings. We can take advantage of those inputs and outputs to reverse engineer what’s going on inside. If you vary the inputs in enough ways and pay close attention to the outputs, you can start piecing together a theory, or at least a story, of how the algorithm works, including how it transforms each input into an output, and what kinds of inputs it’s using. We don’t necessarily need to understand the code of the algorithm to start surmising something about how the algorithm works in practice.”

Problematising computationality


Infoamazonia is one of a number of projects seeking to make environmental crime ‘visible’. Image from Geojournalism

But the next wave of data journalism cannot just solve the new technical problems that the industry faces: it must also “problematise computationality”, to use the words of David M. Berry:

“So that we are able to think critically about how knowledge in the 21st century is transformed into information through computational techniques, particularly within software.”

His argument relates to the role of the modern university in a digital society, but the same arguments can be made about journalism’s role too:

“The digital assemblages that are now being built … provide destablising amounts of knowledge and information that lack the regulating force of philosophy — which, Kant argued, ensures that institutions remain rational.

“… There no longer seems to be the professor who tells you what you should be looking up and the ‘three arguments in favour of it’ and the ‘three arguments against it’.”

This is not to argue for the reintroduction of gatekeepers, but to highlight instead that information is not neutral, and it is the role of the journalist – just as it is the role of the educator – to put that information into context.

Crime mapping is one particularly good example of this. What can be more straightforward than placing crimes on a map? As Theo Kindynis writes of crime mapping, however:

“It is increasingly the case that it simply does not make sense to think about certain types of crime in terms of our conventional notions of space. Cybercrime, white-collar financial crime, transnational terrorism, fraud and identity theft all have very real local (and global) consequences, yet ‘take place’ within, through or across the ‘space of flows’ (Castells 1996). Such a-spatial or inter-spatial crime is invariably omitted from conventional crime maps.” (Kindynis 2014)

All this serves to provide some shape to the landscape that we are approaching. To navigate it we perhaps need some more specific principles of our own to help.

In the third and final part of this series, then, I want to attempt to build on Kovach and Rosenstiel’s work with principles which might form a basis for data journalism as it enters its second and third decades.

You can read all three extracts under the tag ‘Next Wave of Data Journalism’ here.

Filed under: online journalism Tagged: algorithms, computational thinking, data journalism, David M. Berry, Jeannette M. Wing, next wave of data journalism, Nicholas Diakopoulos, Theo Kindynis

The next wave of data journalism?

In the first of three expanded extracts from a forthcoming book chapter on ‘The next wave of data journalism’ I outline some of the ways that data journalism is reinventing itself, and adapting for a world which is rapidly changing again. Where networked communications and processing power were key in the 2000s, automation and AI are becoming key in the decade to come. And just as data journalism raised the bar for journalism as a whole, the bar is about to be raised for data journalism itself.

Data journalism isn’t doing enough. Now into its second decade, the noughties-era technologies that it was built on – networked access to information and vastly improving visualisation capabilities – are now taken for granted, just as the ‘computer assisted’ part of its antecedent Computer Assisted Reporting was.

In just ten years data journalism has settled down into familiar practices and genres, from the interactive map and giant infographics to the quick turnaround “Who comes bottom in the latest dataset” write-up. It’s a sure sign of maturity when press officers are sending you data journalism-based media releases.

Now we need to move forward. And the good news is: there are plenty of places to go.

Looking back to look forward

Phillip Meyer's reporting

Philip Meyer’s reporting laid the foundation for CAR

In order to look forward it is often useful to look back: in any history of data journalism you will read that it came partly out of the Computer Assisted Reporting (CAR) tradition that emerged in the US in the late 1960s.

CAR saw journalists using spreadsheet and database software to analyse datasets, but it also had important political and cultural dimensions too: firstly, the introduction of a Freedom of Information Act in the US which made it possible to access more data than before; and secondly, the spread of social science methods into politics and journalism, pioneered by seminal CAR figure Philip Meyer.

Data journalism, like CAR, had technological, political and cultural dimensions too. Where CAR had spreadsheets and databases, data journalism had APIs and datavis tools; where CAR had Freedom of Information, data journalism had a global open data movement; and where CAR acted as a trojan horse that brough social science methods into the newsroom, data journalism has brought ‘hacker’ culture into publishing.

Much of the credit for the birth of data journalism lies outside of the news industry: often overlooked in histories of the form is the work of civic coders and information activists (in particular MySociety which was opening up political data and working with news organisations well before the term data journalism was coined), and technology companies (the APIs and tools of Yahoo! for example formed the basis of much of data journalism’s early experiments).

The early data journalists were self-created, but as news organisations formalised data journalism roles and teams, data journalism newswork has been formalised and routinised too.

So where do we look for data journalism’s next wave of change?

Look outside news organisations once again and you see change in two areas in particular: on the technical side, an increasing use of automation, from algorithms and artificial intelligence (AI) to bots and the internet of things.

On the political side, a retreat from open data and transparency while non-governmental organisations take on an increasingly state-like role in policing citizens’ behaviour.

What is data journalism for?


Datavis can be seen as “Striving to keep the significant interesting and relevant”

Data journalists will often tell you that the key part of data journalism is the journalism bit: we are not just analysing data but finding and telling important stories in that. But journalism isn’t just about stories, either. Kovach and Rosenstiel, in their excellent book Principles of Journalism, outline 10 principles which are always important to return to:

  • Journalism’s first obligation is to the truth
  • Its first loyalty is to citizens
  • Its essence is a discipline of verification
  • Its practitioners must maintain an independence from those they cover
  • It must serve as an independent monitor of power
  • It must provide a forum for public criticism and compromise
  • It must strive to keep the significant interesting and relevant
  • It must keep the news comprehensive and proportional
  • Its practitioners must be allowed to exercise their personal conscience
  • Citizens, too, have rights and responsibilities when it comes to the news

Some of these can be related to data journalism relatively easily: journalism’s first obligation to the truth, for example, appears to be particularly well served by an ability to access and analyse data.

Striving to keep the significant interesting and relevant? Visualisation and interactivity are great examples of how data journalism has been able to do just that for even the dryest subjects.

But an attraction to those more obvious benefits of data journalism can distract us from the demands of the other principles.

Is data journalism “a discipline of verification”, or do we attribute too much credibility to data? Cleaning data, and seeking further sources that can independently confirm what the data appears to tell us are just two processes that should be just as central as being able to generate a bar chart.

Some of the other principles become more interesting when you begin to look at developments that are set to impact our practice in the coming decades…

Rise of the robots

The rise of ‘robot journalism‘ – the use of automated scripts to analyse data and generate hundreds of news reports that would be impossible for individual journalists to write – is one to keep a particular eye on.

Aside from the more everyday opportunities that automation offers for reporting on amateur sports or geological events, automation also offers an opportunity to better “serve as an independent monitor of power”.

Lainna Fader, Engagement Editor at New York Magazine, for example, highlights the way that bots are useful “for making value systems apparent, revealing obfuscated information, and amplifying the visibility of marginalized topics or communities.”

By tweeting every time anonymous sources are used in the New York Times the Twitter bot @NYTanon serves as a watchdog on the watchdogs (Lokot and Diakopoulos 2015).

But is robot journalism a “discipline of verification” (another of Kovach and Rosenstiel’s principles)? Well, that all boils down to the programming: in 2015 Matt Carlson talked about the rise of new roles of “meta-writer” or “metajournalist” to “facilitate” automated stories using methods from data entry to narrative construction and volunteer management.  And by the end of 2016 observers were talking about augmented journalism‘: the idea of using computational techniques to assist in your news reporting.

The concept of ‘augmented journalism’ is perhaps a defensive response to the initial examples of robot journalism: with journalists feeling under threat, the implied assurance is that robots would free up time for reporters to do the more interesting work.

What has remained unspoken, however, is that in order for this to happen, journalists need to be willing — and able — to shift their focus from routine, low-skilled processes to workflows involving high levels of technical skill, critical abilities — and computational thinking.

But more than a decade on from Adrian Holovaty’s seminal post “A fundamental way newspaper sites need to change”, there is very little evidence of this being seriously addressed in journalism training or newsroom design. Instead, computational thinking is being taught earlier, to teenagers and younger children at school.

Some of those may, in decades to come, get a chance to reshape the newsroom themselves. In the next part of this series, then, I look at how computational thinking is likely to play a role in the next wave of data journalism — and the need to problematise and challenge it at the same time.

Filed under: online journalism Tagged: data journalism, extract, Kovach and Rosenstiel, Lainna Fader, Matt Carlson, philip meyer

Data journalism on radio, audio and podcasts

In a previous post I talked about how data journalism stories are told in different ways on TV and in online video. I promised I’d do the same for audio and radio — so here it is: examples from my MA in Data Journalism to give you ideas for telling data stories using audio.

this american life

As with any audio post, This American Life features heavily: not only is the programme one of the best examples of audio journalism around — it also often involves data too.

Right To Remain Silent is one particularly good example, because it’s about bad data: specifically. police who manipulated official statistics.

You might also listen to Choosing Wrong, which includes a section about polling.

Another favourite of mine is an audio story by The Economist about the prostitution industry, based on data scraped from sex trade websites: More bang for your buck (there are even worse puns in the charts).

David Rhodes, a BBC data journalist, has a range of stories on his Audioboom account, including pieces on Radio 4, Radio 5 Live, and this piece from the excellent factchecking radio programme, More or Less.

In podcasting this episode of The Allusionist tells a story about an experiment with data and dating.

Finally, I have to include an episode of Radiolab, one of my favourite podcasts. Shots Fired — which is split into two episodes — employs the common approach of interviewing the journalist who undertook a data-driven investigation (in other words, hooking the story on the journalist’s ‘quest’). It’s embedded below. For a geekier trip, try their podcast about Benford’s Law.

Podcasts about data journalism

There are also many great podcasts about data itself — one of my former students compiled a list for GIJN:

If you’ve heard any other examples of data stories being told through audio, please let me know — I’m always on the lookout for more!

Filed under: online journalism Tagged: audio, BBC, data journalism, David Rhodes, podcasting, radiolab, The Allusionist, The Economist, This American Life

Data journalism in broadcast news and video: 27 examples to inspire and educate

channel 4 network diagram

This network diagram comes from a Channel 4 News story

The best-known examples of data journalism tend to be based around text and visuals — but it’s harder to find data journalism in video and audio. Ahead of the launch of my new MA in Data Journalism I thought I would share my list of the examples of video data journalism that I use with students in exploring data storytelling across multiple platforms. If you have others, I’d love to hear about them.

FOI stories in broadcast journalism

victoria derbyshire gif

Freedom of Information stories are one of the most common situations when broadcasters will have to deal with more in-depth data. These are often brought to life by through case studies and interviewing experts.

In 2015, for example, a former and then-current MA student worked with the BBC’s Victoria Derbyshire programme on FOI responses from 42 police forces relating to violence in schools. The online version of the story included an interview with a former teacher affected by the issue (captured in the gif above).

Other British examples include this ITV story on mental health trusts cutting beds, and this Channel 4 Dispatches piece on benefit sanctions. And I keep a list of other FOI-based stories by the BBC here.

In Canada Fifth Estate’s Rate My Hospital investigation in 2013 featured a number of case study and expert video clips online, this time more presenter-led, while also in Canada this Global News story on pit bull attacks uses charts and tables online, but vox pops and archive clips in the (embedded) broadcast treatment. You can also watch broadcast treatments of stories by their data journalist Patrick Cain into car testing and problem gambling.

Another data journalist in a broadcast organisation is Tisha Thompson at NBCUniversal. Her examples include “collecting rape statistics when the military refuses to hand them over” (more here); government employees accused of stealing the beer they’re supposed to be delivering (more here). (Tisha says this is “Why you should make your own database, especially when the government doesn’t do it”); water quality in Virginia and Maryland; high-end luxury and fashion brands on a list of government seizures; and potholes.

Striking statistics

Hans Rosling, who died earlier this year, did much to popularise the use of statistics and data visualisation. His engaging presentation style led BBC4 to commission a series on “The Joy of Stats“. Here’s one of the highlights:

Broadcast data journalism by students

Karl Idsvoog at Kent State University shared a number of examples of his students producing video reports on their data journalism projects, including pieces on university marketing budgets, free cars for coaches, high school concussions, and athletes missing class (shown above). They’re all good examples of data stories that can be found on your doorstep.

Network analysis in video

Network analysis — analysing relationships between actors in a story — is becoming more and more widely used. Here are a couple of examples where a broadcaster has used it: first, the BBC’s Newsnight leans on a galactic metaphor…

…And second, Channel 4 News uses a network to illustrate the complex story of Rangers Football Club’s troublesome finances:

The data isn’t on screen — but it’s behind the story

One of the reasons it’s not always easy to think of good examples of data journalism in video and audio is because the data itself is hidden. Channel 4’s investigative programme Dispatches often features investigations where data analysis is involved, but it’s not always obvious in the programme itself.

Britain’s Hidden Child Abuse – shown below – involved compiling spreadsheets to demonstrate the scale of the problem, which also helped one reporter to identify recurring reasons why people did not involve the police authorities.

Those spreadsheets were also crucial in convincing the lawyers that they could defend any legal action.

Web-native video

Data video journalism doesn’t have to be made for broadcast. Many of the stories that I’ve worked on in the BBC England data unit have included a video clip. This investigation we did into library cuts includes a caption-led video on how one prominent library has been affected by the cuts.

Across social media the BBC also used a short clip to illustrate some of the key statistics from the story:

As an aside, many radio stations reported on the story by interviewing librarian Lauren Smith, and well known authors.

This story on the impact of a government scheme leads on a video clip which includes interviews with people who used the scheme, and this investigation into midwife led units also led on a video with someone who, like one in four patients, had to be transferred to a consultant led unit. This music festival data story’s lead video goes from a gif-style stop motion to expert interviews.

And if you’re doing a data story involving animals, there really has to be video too.

Germany’s public service broadcaster Bayerischen Rundfunk produces data journalism including the example below…

…and Swiss broadcaster SRF has an impressive data operation too.

Can you add any?

These are just some of the examples I’ve come across in video and broadcast media (I’ll look at audio in a separate post). I’m always on the look-out for new examples, so please let me know if you’ve seen others.

Filed under: online journalism Tagged: Bayerischen Rundfunk, BBC, broadcast, data journalism, dispatches, Fifth Estate, MA Data Journalism, NBC Universal, Patrick Cain, SRF, Tisha Thompson

This site publishes high-touch, time-intensive data visualizations (and has a business that sustains it)

Over 7,000 artists played in the New York City area in 2013. Only 21 of those later made it, really made it, headlining at a venue with an over 3,000-person capacity — among them, bigger names like Chance the Rapper, X Ambassadors, Sam Smith, and Sylvan Esso.

I learned this sort of random but fascinating tidbit from a data visualization titled “The Unlikely Odds of Making it Big,” from the site The Pudding.

The Pudding is the home to high-touch, painstakingly crafted data visualizations — what the site calls “visual essays” — that are distinct in their obsessive complexity over points of cultural curiosity. Most pieces stand wholly apart from the U.S. news cycle; no anxiety-inducing interactives around budget, taxes, health care. Want to see everywhere jazz legend Miles Davis is mentioned across Wikipedia, and how he’s connected to other people, recordings, and places? Here you go.

(Other things I’ve discovered browsing The Pudding’s interactives: that the town where I live is probably not the microbrew capital of the U.S., that there’s pretty strong evidence that NBA refs favor the home team, that the song “No Diggity” by Blackstreet is irrefutably timeless, at least based on Spotify play counts, compared to its 1990s peers.)

Pudding is the newly partitioned off editorial arm of a three-person data visualizations company Polygraph (!), started two years ago by Matt Daniels, a consultant with a digital marketing background. Daniels and his partners Russell Goldenberg and Ilia Blinderman publish sumptuous visualizations that scratch personal itches. The Pudding also works closely with freelancers on pretty much whatever questions they’re interested in exploring visually, as long as it’s based on data. Freelancers are paid a flat rate of $5,000 for each piece.

“We’re all over the map. But basically, every individual picks their idea, we vet it ourselves and make sure the data’s there, that it’s interesting, and we just go off and do it,” Goldenberg told me. (The ideas backlog for The Pudding is listed out in this public Google Doc.) “Our goal is for The Pudding to be a weekly journal. We specifically seek out stories that aren’t news related, because we don’t want to compete in that space. The Washington Post, The New York Times, FiveThirtyEight, lots of places are doing interactive graphics well, doing multiple data journalism pieces per day. That doesn’t jive with what we want to be.”

Goldenberg previously worked at The Boston Globe as an interactive news developer and Blinderman’s a science and culture writer who studied data journalism at Columbia. Despite journalistic credentials, The Pudding (and Polygraph) isn’t aiming to be a journalistic enterprise. The team might in the course of developing a visualization call up a few people to run questions by them, or have to create their own data source (this freelancer’s exploration of the Hamilton musical libretto, for instance), but most of the data it builds interactives on is already available (no FOIAing needed).

Work gets promoted on The Pudding site, and through the Polygraph and Pudding newsletters, which will eventually merge into one. Polygraph’s newsletter sharing the latest visualizations has about 10,000 subscribers; The Pudding’s has about 1,000 after launching this year. Otherwise, promotion is largely word of mouth — and some pieces have been able to spread widely that way. They’re definitely open to collaborating with “more visible partners,” Goldenberg told me, though “we’re not being aggressive about our outreach.”

(A similar project popped up last year called The Thrust, which wanted to serve as a home for data visualization projects that didn’t fit with traditional news organizations or into their news cycles. The creators left for full-time jobs at ProPublica and The New York Times and the site has stopped updating.)

The moneymaking side of Polygraph functions like a digital agency, with Daniels, Goldenberg, and Blinderman pushing out projects for large clients like YouTube, Google News Lab, and Kickstarter. Goldenberg wouldn’t disclose how much they charge for these sponsored pieces, but revenue generated from a handful of client projects funds the entire editorial side, including paying for freelancers pieces and the three current full-time staffers’ salaries.

“We try to take on client work to just support our staff and basically to sustain The Pudding, with about three to six freelancers each quarter — what we’re doing is maybe kind of backwards,” Goldenberg said. “The thing about our editorial work is that also essentially serves as marketing for us. Generally, when we publish a new project on The Pudding, we get a few business inquiries. It’s a nice symbiotic relationship.”

Polygraph is also hiring for two more full-time positions — a “maker” and an editor — both at competitive salaries, which suggests that its client-side business is going quite well. Its ambitions looking forward, though, are straightforward: publish more interesting data-driven visualizations.

“We want to push forward the craft of visual storytelling, and these are not things you do on a daily basis,” Goldenberg said. “We still want to take our time and spend a couple of weeks, maybe a month or more, on a project. Unless we have dozens of people working with us, we wouldn’t really be able to publish more than once a week or so. We’re mostly just trying to establish that rhythm, and keep pushing out good pieces.”

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑