Text-as-data journalism? Highlights from a decade of SOTU speech coverage

January 2012: The National Post’s graphics team analyzes keywords used in State of the Union addresses by presidents Bush and Obama / Image: © Richard Johnson/The National Post

In a guest post for OJB, Barbara Maseda looks at how the media has used text-as-data to cover State of the Union addresses over the last decade. Continue reading “Text-as-data journalism? Highlights from a decade of SOTU speech coverage”

What do journalists do with large amounts of text?

Barbara Maseda is on a John S. Knight Journalism Fellowship project at Stanford University, where she is working on designing text processing solutions for journalists. In a special guest post she explains what she’s found so far — and why she needs your help.

Over the last few months, I have been talking to journalists about their trials and tribulations with textual sources, trying to get as detailed a picture as possible of their processes, namely:

  • how and in what format they obtain the text,
  • how they find newsworthy information in the documents,
  • using what tools,
  • for what kinds of stories,

…among other details.

What I’ve found so far is fascinating: from tech-savvy reporters who write their own code when they need to analyze a text collection, to old-school investigative journalists convinced that printing and highlighting are the most reliable and effective options — and many shades of approaches in between. Continue reading “What do journalists do with large amounts of text?”

All my data journalism ebooks are $5 or less this Christmas


The prices of my 3 data journalism ebooks — Data Journalism Heist, Finding Stories in Spreadsheets and Scraping for Journalists — have been cut to $5 on Leanpub in the lead up to Christmas. And if you want to get all 3, you can also get the data journalism books bundle on Leanpub for more than half price over the same period, at $13. Get them while it lasts!

Filed under: online journalism Tagged: books, data journalism, Data Journalism Heist, Finding Stories In Spreadsheets, sale, Scraping for Journalists  

Data journalism’s AI opportunity: the 3 different types of machine learning & how they have already been used

This week I’m rounding off the first semester of classes on the new MA in Data Journalism with a session on artificial intelligence (AI) and machine learning. Machine learning is a subset of AI — and an area which holds enormous potential for journalism, both as a tool and as a subject for journalistic scrutiny.

So I thought I would share part of the class here, showing some examples of how the 3 types of machine learning — supervised, unsupervised, and reinforcement — have already been used for journalistic purposes, and using those to explain what those are along the way. Continue reading “Data journalism’s AI opportunity: the 3 different types of machine learning & how they have already been used”

Here are all the presentations from Data Journalism UK 2017

Last week I had the pleasure of hosting the second annual Data Journalism UK conference in Birmingham.

The event featured speakers from the regional press, hyperlocal publishers, web startups, nonprofits, and national broadcasters in the UK and Ireland, with talks covering investigative journalism, automated factchecking, robot journalism, the Internet of Things, and networked, collaborative data journalism. You can read a report on the conference at Journalism.co.uk. Continue reading “Here are all the presentations from Data Journalism UK 2017”

Announcing a part time PGCert in Data Journalism


Earlier this year I announced a new MA in Data Journalism. Now I am announcing a version of the course for those who wish to study a shorter, part time version of the course.

The PGCert in Data Journalism takes place over 8 months and includes 3 modules from the full MA:

  • Data Journalism;
  • Law, Regulation and Institutions (including security); and
  • Specialist Journalism, Investigations and Coding

Continue reading “Announcing a part time PGCert in Data Journalism”

How to: get started with SQL in Carto and create filtered maps

Today I will be introducing my MA Data Journalism students to SQL (Structured Query Language), a language used widely in data journalism to query databases, datasets and APIs.

I’ll be partly using the mapping tool Carto as a way to get started with SQL, and thought I would share my tutorial here (especially as since its recent redesign the SQL tool is no longer easy to find).

So, here’s how you can get started using SQL in Carto — and where to find that pesky SQL option. Continue reading “How to: get started with SQL in Carto and create filtered maps”

Information is Beautiful Awards 2017: “Visualisation without story is nothing”

david mccandless

David McCandless, founder of the IiB awards, hosted the ceremony

MA Data Journalism students Carmen Aguilar Garcia and Victoria Oliveres attended the Information is Beautiful awards this week and spoke to some of the nominees and winners. In a guest post for OJB they give a rundown of the highlights, plus insights from data visualisation pioneers Nadieh Bremer, Duncan Clark and Alessandro Zotta.

Nadieh Bremer was one of the major winners at this year’s Information is Beautiful Awards 2017 — winning in both the Science & Technology and Unusual categories for Why Are so Many Babies Born around 8:00 A.M.? (with Zan Armstrong and Jennifer Christiansen) and Data Sketches in Twelve Installments (with Shirley Wu).

Why Are so Many Babies Born around 8am

Silver, Science & technology category – Why Are so Many Babies Born around 8:00 A.M.? by Nadieh Bremer, Zan Armstrong & Jennifer Christiansen. The prize was shared with Zan Armstrong, Scientific American.

Continue reading “Information is Beautiful Awards 2017: “Visualisation without story is nothing””

How one Norwegian data team keeps track of their data journalism projects

In a special guest post Anders Eriksen from the #bord4 editorial development and data journalism team at Norwegian news website Bergens Tidende talks about how they manage large data projects.

Do you really know how you ended up with those results after analyzing the data from Public Source?

Well, often we did not. This is what we knew:

  • We had downloaded some data in Excel format.
  • We did some magic cleaning of the data in Excel.
  • We did some manual alterations of wrong or wrongly formatted data.
  • We sorted, grouped, pivoted, and eureka! We had a story!

Then we got a new and updated batch of the same data. Or the editor wanted to check how we ended up with those numbers, that story. Continue reading “How one Norwegian data team keeps track of their data journalism projects”

A potted history of the last 6 years? How the Online Journalism Handbook changed between 2011 and 2017

Continue reading “A potted history of the last 6 years? How the Online Journalism Handbook changed between 2011 and 2017”

9 *more* newsletters about data and vis? Yes!

A few weeks ago I posted a list of 9 great newsletters about data. The post generated so many suggestions of other newsletters that I thought I’d gather them together in a follow-up post. So, here are 9 more newsletters about data journalism, data science, and data visualisation.

1. Graphic Content

graphic content logo

Graphic Content is a regular email newsletter — and Tumblr blog — from the head of data and transparency at the Institute for Government, Gavin Freeguard.

The format is simple: a collection of lists to some of the most interesting data visualisation, data journalism and ‘meta data’ (other links about data) that day. You can subscribe to the newsletter here.

2. Hacks/Hackers


hacks hackers logo

Hacks/Hackers is a global network of meetups for journalists (hacks) and developers (hackers) interested in the potential of data for newsgathering and storytelling.

The network also has a weekly email which recently reached its 100th issue. It also rounds up events around the world in the week ahead, jobs, funding and useful links. You can subscribe to it on their blog.

3. Best in Visual Storytelling

Best in Visual Storytelling
Rachel Schallom emailed to let me know about her weekly visual journalism newsletter Best in Visual Storytelling, “which isn’t 100% about data, but includes a ton of data-driven projects.” It arrives on Mondays. The sign-up form is here.

4. Data Elixir

The first of four newsletters suggested by Jeremy Singer-Vine, whose newsletter Data Is Plural featured in the original post, Data Elixir is “a weekly newsletter of curated data science news and resources from around the web” on Tuesdays, from Lon Riesberg. It’s already passed 150 issues.

5. Data Science Weekly

Surpassing that, Data Science Weekly recently hit its 200th issue. It focuses on data science, with news, articles and jobs. The archive covers everything from predicting NFL plays to tutorials on creating a bar chart.

6. Data & Society

Data & Society is a research institute “focused on the social and cultural issues arising from data-centric technological development.”

If you’re interested in the more critical/academic side of data journalism, their newsletter provides updates on their research, events, and other useful links.

7. The Data Science Community newsletter

NYU Center for Data Science logo
NYU’s Center for Data Science publishes its own newsletter focused on the data science community and “featuring data science news delivered with humor & snark plus an always popular Tweet of the Week”. The emphasis here is on breadth with lots of detail on each link.

8. data.world Data Digest

Gabriela Swider from data.world – a new platform for sharing and analysing data – got in touch to recommend their Data Digest, which highlights a few of the most interesting datasets on the platform every Friday. Subscribe here.

9. Naked Data

And rounding off the list on a high is Jason Norwood-Young’s newsletter Naked Data — recommended by Anastasia Valeeva. “Sign up for a weekly roundup of the best data journalism projects, news, tech and happenings from around the world,” promises the sign up page. There’s a lot here beyond the usual suspects, and it’s well curated.

If you know of any newsletters not mentioned here or in the previous post, please let me know!

Filed under: online journalism Tagged: Anastasia Valeeva, Best in Visual Storytelling, Data & Society, Data Elixir, Data Science Weekly, data.world, email, Gabriela Swider, Gavin Freeguard, Graphic Content, hacks/hackers, Jason Norwood-Young, Lon Riesberg, Naked Data, newsletters, NYU Center for Data Science, Rachel Schallom

Announcing the line up for Data Journalism UK 2017

Megan Lucero

The Bureau Local’s Megan Lucero

We’ve confirmed the line up for this year’s Data Journalism UK conference on December 5 — and I’m pretty excited about it.

We’ve managed to pack in networked data journalism and investigations, automation and the internet of things, and some practical sessions too, with my new MA Data Journalism students pitching in to help.

Tickets are available here including early bird and afternoon-only options, but you’ll need to be quick — the event sold out last year.

Here’s more detail on the running order…

Networked data journalism

Kicking off the day is Megan Lucero who has been leading the Bureau of Investigative Journalism’s project Bureau Local.

The former Times data journalist will talk about what they’ve learned one year in to the project, which was established with £500,000 from Google’s Digital News Innovation Fund.

Also aiming to stimulate data journalism at a local level is the BBC’s new Shared Data Unit, based here in Birmingham.

Peter Sherlock, who heads up the team, will be talking about the first few months of that project as the unit takes on its first secondees from partners in local media.

Data investigations

On the day that we held the last Data Journalism UK conference, Johnston Press announced that they were forming a new investigations unit. Project lead Aasma Day will be here this year to talk about what has happened since.

There’s a terrific first panel of investigative journalists including the winner of this year’s Paul Foot award, Emma Youle and The Ferret’s Peter Geoghegan.

And Karrie Kehoe will be speaking about how she works on computational investigations at the Irish broadcaster RTÉ.

Automation and factchecking

Two more recipients of funding from the Google Digital News Initiative are speaking in the afternoon. Urbs Media CEO Alan Renwick has worked with publishers such as Thomson Regional Newspapers, Mirror Group, TES and DMGT, and was Strategy Director at regional group Local World.

Now he’s leading The Press Association’s robot journalism project RADAR (‘Reporters And Data And Robots’).

And Mevan Babakar from FullFact will be speaking about their project to automate factchecking.

Joining them will be CW Anderson, the editor of the book Remaking The News, currently working on a forthcoming book about data journalism, and former Guardian media and technology reporter Mercedes Bunz, co-author of ‘The Internet of Things‘.

Hands-on sessions

We’ll have practical sessions at different points in the day, with attendees invited to nominate skills they would like covered.

Trinity Mirror data journalist Rob Grant will be doing a session on R for journalists and I’ll be doing a session on handling big data, based on a story that involved analysing 37 million rows of crime data.

You can book tickets on the Eventbrite page, or by clicking on the image below.

Filed under: online journalism Tagged: Aasma Day, Alan Renwick, Bureau Local, CW Anderson, data journalism UK, Emma Youle, Ferret, FullFact, Google Digital News Initiative, investigative journalism, Johnston Press, Karrie Kehoe, Megan Lucero, Mercedes Bunz, Mevan Babakar, Peter Geoghegan, Peter Sherlock, RADAR, robot journalism, RTE, Urbs Media

Recipients’ Assessment of Journalistic Quality

Digital Journalism Vol. 0 , Iss. 0,0
Many studies and their findings are available concerning most aspects of online user comments and their effects. Two experimental studies were conducted to examine the quality of a journalistic product and the valence of user comments. The results indicate that, overall, high-quality journalistic products were evaluated as (slightly) better than low-quality versions. Furthermore, users evaluated the quality of journalistic products with positive user comments as better than the quality of identical products with negative comments. Considering that, in reality, user comments are predominantly critical, it is worrying that these comments have rather negative effects on the evaluation of journalistic work.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑