All my data journalism ebooks are $5 or less this Christmas

 

The prices of my 3 data journalism ebooks — Data Journalism Heist, Finding Stories in Spreadsheets and Scraping for Journalists — have been cut to $5 on Leanpub in the lead up to Christmas. And if you want to get all 3, you can also get the data journalism books bundle on Leanpub for more than half price over the same period, at $13. Get them while it lasts!

Filed under: online journalism Tagged: books, data journalism, Data Journalism Heist, Finding Stories In Spreadsheets, sale, Scraping for Journalists  

Data journalism’s AI opportunity: the 3 different types of machine learning & how they have already been used

This week I’m rounding off the first semester of classes on the new MA in Data Journalism with a session on artificial intelligence (AI) and machine learning. Machine learning is a subset of AI — and an area which holds enormous potential for journalism, both as a tool and as a subject for journalistic scrutiny.

So I thought I would share part of the class here, showing some examples of how the 3 types of machine learning — supervised, unsupervised, and reinforcement — have already been used for journalistic purposes, and using those to explain what those are along the way. Continue reading “Data journalism’s AI opportunity: the 3 different types of machine learning & how they have already been used”

Here are all the presentations from Data Journalism UK 2017

Last week I had the pleasure of hosting the second annual Data Journalism UK conference in Birmingham.

The event featured speakers from the regional press, hyperlocal publishers, web startups, nonprofits, and national broadcasters in the UK and Ireland, with talks covering investigative journalism, automated factchecking, robot journalism, the Internet of Things, and networked, collaborative data journalism. You can read a report on the conference at Journalism.co.uk. Continue reading “Here are all the presentations from Data Journalism UK 2017”

Announcing a part time PGCert in Data Journalism

 

Earlier this year I announced a new MA in Data Journalism. Now I am announcing a version of the course for those who wish to study a shorter, part time version of the course.

The PGCert in Data Journalism takes place over 8 months and includes 3 modules from the full MA:

  • Data Journalism;
  • Law, Regulation and Institutions (including security); and
  • Specialist Journalism, Investigations and Coding

Continue reading “Announcing a part time PGCert in Data Journalism”

How to: get started with SQL in Carto and create filtered maps

Today I will be introducing my MA Data Journalism students to SQL (Structured Query Language), a language used widely in data journalism to query databases, datasets and APIs.

I’ll be partly using the mapping tool Carto as a way to get started with SQL, and thought I would share my tutorial here (especially as since its recent redesign the SQL tool is no longer easy to find).

So, here’s how you can get started using SQL in Carto — and where to find that pesky SQL option. Continue reading “How to: get started with SQL in Carto and create filtered maps”

How one Norwegian data team keeps track of their data journalism projects

In a special guest post Anders Eriksen from the #bord4 editorial development and data journalism team at Norwegian news website Bergens Tidende talks about how they manage large data projects.

Do you really know how you ended up with those results after analyzing the data from Public Source?

Well, often we did not. This is what we knew:

  • We had downloaded some data in Excel format.
  • We did some magic cleaning of the data in Excel.
  • We did some manual alterations of wrong or wrongly formatted data.
  • We sorted, grouped, pivoted, and eureka! We had a story!

Then we got a new and updated batch of the same data. Or the editor wanted to check how we ended up with those numbers, that story. Continue reading “How one Norwegian data team keeps track of their data journalism projects”

A potted history of the last 6 years? How the Online Journalism Handbook changed between 2011 and 2017

Continue reading “A potted history of the last 6 years? How the Online Journalism Handbook changed between 2011 and 2017”

9 *more* newsletters about data and vis? Yes!

A few weeks ago I posted a list of 9 great newsletters about data. The post generated so many suggestions of other newsletters that I thought I’d gather them together in a follow-up post. So, here are 9 more newsletters about data journalism, data science, and data visualisation.

1. Graphic Content

graphic content logo

Graphic Content is a regular email newsletter — and Tumblr blog — from the head of data and transparency at the Institute for Government, Gavin Freeguard.

The format is simple: a collection of lists to some of the most interesting data visualisation, data journalism and ‘meta data’ (other links about data) that day. You can subscribe to the newsletter here.

2. Hacks/Hackers

 

hacks hackers logo

Hacks/Hackers is a global network of meetups for journalists (hacks) and developers (hackers) interested in the potential of data for newsgathering and storytelling.

The network also has a weekly email which recently reached its 100th issue. It also rounds up events around the world in the week ahead, jobs, funding and useful links. You can subscribe to it on their blog.

3. Best in Visual Storytelling

Best in Visual Storytelling
Rachel Schallom emailed to let me know about her weekly visual journalism newsletter Best in Visual Storytelling, “which isn’t 100% about data, but includes a ton of data-driven projects.” It arrives on Mondays. The sign-up form is here.

4. Data Elixir

The first of four newsletters suggested by Jeremy Singer-Vine, whose newsletter Data Is Plural featured in the original post, Data Elixir is “a weekly newsletter of curated data science news and resources from around the web” on Tuesdays, from Lon Riesberg. It’s already passed 150 issues.

5. Data Science Weekly

Surpassing that, Data Science Weekly recently hit its 200th issue. It focuses on data science, with news, articles and jobs. The archive covers everything from predicting NFL plays to tutorials on creating a bar chart.

6. Data & Society

Data & Society is a research institute “focused on the social and cultural issues arising from data-centric technological development.”

If you’re interested in the more critical/academic side of data journalism, their newsletter provides updates on their research, events, and other useful links.

7. The Data Science Community newsletter

NYU Center for Data Science logo
NYU’s Center for Data Science publishes its own newsletter focused on the data science community and “featuring data science news delivered with humor & snark plus an always popular Tweet of the Week”. The emphasis here is on breadth with lots of detail on each link.

8. data.world Data Digest

Gabriela Swider from data.world – a new platform for sharing and analysing data – got in touch to recommend their Data Digest, which highlights a few of the most interesting datasets on the platform every Friday. Subscribe here.

9. Naked Data

And rounding off the list on a high is Jason Norwood-Young’s newsletter Naked Data — recommended by Anastasia Valeeva. “Sign up for a weekly roundup of the best data journalism projects, news, tech and happenings from around the world,” promises the sign up page. There’s a lot here beyond the usual suspects, and it’s well curated.

If you know of any newsletters not mentioned here or in the previous post, please let me know!

Filed under: online journalism Tagged: Anastasia Valeeva, Best in Visual Storytelling, Data & Society, Data Elixir, Data Science Weekly, data.world, email, Gabriela Swider, Gavin Freeguard, Graphic Content, hacks/hackers, Jason Norwood-Young, Lon Riesberg, Naked Data, newsletters, NYU Center for Data Science, Rachel Schallom

Announcing the line up for Data Journalism UK 2017

Megan Lucero

The Bureau Local’s Megan Lucero

We’ve confirmed the line up for this year’s Data Journalism UK conference on December 5 — and I’m pretty excited about it.

We’ve managed to pack in networked data journalism and investigations, automation and the internet of things, and some practical sessions too, with my new MA Data Journalism students pitching in to help.

Tickets are available here including early bird and afternoon-only options, but you’ll need to be quick — the event sold out last year.

Here’s more detail on the running order…

Networked data journalism

Kicking off the day is Megan Lucero who has been leading the Bureau of Investigative Journalism’s project Bureau Local.

The former Times data journalist will talk about what they’ve learned one year in to the project, which was established with £500,000 from Google’s Digital News Innovation Fund.

Also aiming to stimulate data journalism at a local level is the BBC’s new Shared Data Unit, based here in Birmingham.

Peter Sherlock, who heads up the team, will be talking about the first few months of that project as the unit takes on its first secondees from partners in local media.

Data investigations

On the day that we held the last Data Journalism UK conference, Johnston Press announced that they were forming a new investigations unit. Project lead Aasma Day will be here this year to talk about what has happened since.

There’s a terrific first panel of investigative journalists including the winner of this year’s Paul Foot award, Emma Youle and The Ferret’s Peter Geoghegan.

And Karrie Kehoe will be speaking about how she works on computational investigations at the Irish broadcaster RTÉ.

Automation and factchecking

Two more recipients of funding from the Google Digital News Initiative are speaking in the afternoon. Urbs Media CEO Alan Renwick has worked with publishers such as Thomson Regional Newspapers, Mirror Group, TES and DMGT, and was Strategy Director at regional group Local World.

Now he’s leading The Press Association’s robot journalism project RADAR (‘Reporters And Data And Robots’).

And Mevan Babakar from FullFact will be speaking about their project to automate factchecking.

Joining them will be CW Anderson, the editor of the book Remaking The News, currently working on a forthcoming book about data journalism, and former Guardian media and technology reporter Mercedes Bunz, co-author of ‘The Internet of Things‘.

Hands-on sessions

We’ll have practical sessions at different points in the day, with attendees invited to nominate skills they would like covered.

Trinity Mirror data journalist Rob Grant will be doing a session on R for journalists and I’ll be doing a session on handling big data, based on a story that involved analysing 37 million rows of crime data.

You can book tickets on the Eventbrite page, or by clicking on the image below.

Filed under: online journalism Tagged: Aasma Day, Alan Renwick, Bureau Local, CW Anderson, data journalism UK, Emma Youle, Ferret, FullFact, Google Digital News Initiative, investigative journalism, Johnston Press, Karrie Kehoe, Megan Lucero, Mercedes Bunz, Mevan Babakar, Peter Geoghegan, Peter Sherlock, RADAR, robot journalism, RTE, Urbs Media

Wanted: MA Data Journalism applicants to partner with The Telegraph

telegraph expenses data journalism

The Telegraph was behind one of the biggest data journalism stories of the last decade

As part of the new MA in Data Journalism we have partnered with a number of organisations who are keen to bring data journalism expertise into their newsroom.

I am now inviting applications from people who want to work with The Telegraph during their MA in Data Journalism at Birmingham City University.

The Telegraph has a long history of data journalism, most famously breaking a series of stories around MPs’ expenses in 2009. Examples of its data journalism – ranging from sport and politics to text analysis and data video – can be found in its TeleGraphs section.

The news organisation is looking for applicants who are interested in developing the ability to clean and analyse data to find interesting stories; an awareness of tools that you can use to source and scrape data; and a knowledge of data visualisation in order to communicate your stories. Successful applicants will learn these skills on the MA course and have the opportunity to apply them in collaboration with The Telegraph.

They say:

“Data journalism at The Telegraph is about uncovering stories in data that people wouldn’t have otherwise known. Whether this is through scrutinising the day’s news to see what relevant data we can add to the story, or through longer investigations and analysis, data-driven reporting involves sourcing, cleaning, analysing and communicating data to tell interesting, innovative and important stories.

“We are here to provide exclusive analysis of complex, structured data with a view to finding the news stories within it and presenting it in compelling visual – as well as textual – ways. We want to see the same in data journalism students. They should be confident in figuring out solid news lines in data and knowing the best ways to visually communicate them.

If you are interested, please apply through the course webpage on the Birmingham City University website, specifying in your supporting statement that you are specifically interested in working with The Telegraph.

Please also indicate why you would be interested in working with the team, and what kind of stories you’d be interested in working on.

Filed under: online journalism Tagged: MA Data Journalism, Telegraph

Here are 9 email newsletters about data… I think you’ll like at least 4 of them

fairwarning metrics

Sophie Warnes doesn’t just round up data journalism in her emails, she *does* data journalism *about* her emails

As the first group of MA Data Journalism students prepare to start their course this month, I’ve been recommending a number of email newsletters in the field that they should be following — and I thought I should share it here too.

Here, then, are 9 email newsletters about data — if I’ve missed any please let me know. Continue reading “Here are 9 email newsletters about data… I think you’ll like at least 4 of them”

‘Storytelling in the Digital Age’: a free ebook

digital storytelling bookA free short ebook on Storytelling in the Digital Age has been published by Gurpreet Mann (disclosure: Gurpreet is a former student of mine).

I’m clearly going to be biased — but I really like it, particularly because it doesn’t just address the technical challenges of new platforms, but also looks at cultural, commercial and narrative contexts. (The chapter on Tumblr and GIFs is a particular highlight). Continue reading “‘Storytelling in the Digital Age’: a free ebook”

10 principles for data journalism in its second decade

10 principles Data journalism

In 2007 Bill Kovach and Tom Rosenstiel published The Elements of Journalism. With the concept of ‘journalism’ increasingly challenged by the fact that anyone could now publish to mass audiences, their principles represented a welcome platform-neutral attempt to articulate exactly how journalism could be untangled from the vehicles that carried it and the audiences it commanded.

In this extract from a forthcoming book chapter* I attempt to use Kovach and Rosenstiel’s principles (outlined in part 1 here) as the basis for a set that might form a basis for data journalism as it enters its second and third decades.

Principle 1: Data journalists should strive to interrogate data as a power in its own right

When data journalist Jean-Marc Manach set out to find out how many people had died while migrating to Europe he discovered that no EU member state held any data on migrants’ deaths. As one public official put it, dead migrants “aren’t migrating anymore, so why care?

Similarly, when the BBC sent Freedom of Information requests to mental health trusts about their use of face-down restraint, six replied saying they could not say how often any form of restraint was used — despite being statutorily obliged to “document and review every episode of physical restraint which should include a detailed account of the restraint” under the Mental Health Act 1983.

The collection of data, the definitions used, and the ways that data informs decision making, are all exercises of power in their own right. The availability, accuracy and employment should all be particular focuses for data journalism as we see the expansion of smart cities and wearable technology.

Principle 2: Editorial independence includes technological independence

I wrote in 2013 about the role of coding in ensuring editorial independence, quoting Lawrence Lessig‘s point, made over a decade ago, that code is law:

“Ours is the age of cyberspace. It, too, has a regulator. This regulator, too, threatens liberty. But so obsessed are we with the idea that liberty means “freedom from government” that we don’t even see the regulation in this new space. We therefore don’t see the threat to liberty that this regulation presents.

“This regulator is code—the software and hardware that make cyberspace as it is. This code, or architecture, sets the terms on which life in cyberspace is experienced. It determines how easy it is to protect privacy, or how easy it is to censor speech. It determines whether access to information is general or whether information is zoned. It affects who sees what, or what is monitored. In a host of ways that one cannot begin to see unless one begins to understand the nature of this code, the code of cyberspace regulates.” (Lessig 2006)

The independence of the journalist is traditionally portrayed as possessing the power to resist pressure from our sources, our bosses and business models, and the government and law enforcement. But in a networked age it will also mean independence from the biases inherent in the tools that we use.

From the content management systems that we use, to the mobile devices that record our every move, independence in the 21st century will be increasingly facilitated by being able to ‘hack’ our tools or build our own.

Code affects what information you can access, your ability to verify it, your ability to protect sources — and your ability to empower them. Finally, code affects your ability to engage users.

Code is a key infrastructure that we work in as journalists: if we understand it, we can move across it much more effectively. If it is invisible to us, we cannot adapt it, we cannot scrutinise it. We are, in short, subject to it.

Principle 3: We should strive for objectivity not just in the sources and language that we use, but also the way that we design our tools

europe from moscow

Mapping tools assume a default point of view. Image: Time Magazine via Newberry Library via Jake Ptacek

In data journalism right now we are at a crucial stage: the era during which we move from making stories and tools for other people, to making our own tools.

As John Culkin, in writing about Marshall McLuhan, said:

“We shape our tools, and thereafter they shape us”.

The values which we embed in those tools, the truths we take for granted, will have implications beyond our own generation.

The work of Lawrence Lessig and Nicholas Diakopoulos highlights the role that code plays in shaping the public lives that we can lead; we need to apply the same scrutiny to our own processes.

When we build tools on maps do we embed the prejudices that have been identified by critical cartographers?

Do we seek objectivity in the visual language we use as well as the words that we choose?

But it is not just the tools which will shape our practice: the reorganisation of newsrooms and the creation of data desks and the data journalist’s routine will also begin to draw the borders of what is considered normal in – and what is considered outside of – the role of the data journalist.

Uskali and Kuutti, for example, already identify at least three different models for organising data journalism work practices: data desks, flexible data projects, and the entrepreneur or sub-contractor model. To what extent these models circumscribe or provide opportunities for new ways of doing journalism bears some reflection.

If we are to augment our journalism, we must do so critically.

Principle 4: Impartiality means not relying only on stories where data exists and is easy to obtain

The increasing abundance of data brings with it a new danger: that we do not look beyond what is already accessible, or that we give up too easily if a story does not seem practical.

Just as the expansion of the PR industry in the 20th century led to accusations of ‘churnalism’ in the media, the expansion of data in the 21st century risks leading to ‘data churnalism’ instead of data journalism, including the use of automation and dashboards as a way of dealing with those accessible sources.

Principle 5: We should strive to give a voice to those who are voiceless in data by seeking to create or open up data which would do so

Head icons

When The Guardian’s ‘The Counted’ project sought to report on people killed by police in the US, it was literally seeking to ‘give a voice to the voiceless’ — because those people were dead; they could not speak.

The Bureau of Investigative Journalism‘s Naming the Dead project had a similar objective: tracking and investigating US covert drone strikes since 2011 and seeking to identify those killed.

Neither is an example of data journalism that uses coding: the skills are as basic as keeping a record of every report you can find. And yet this basic process has an important role at the heart of modern journalism: digitising that which did not exist in digital form before: moving from zeroes to ones. You can find more examples in this post about the approach in 2015:

“Too often data journalism is described in ways that focus on the technical act of working with existing data. But to be voiceless often means that no data exists about your experience.”

Principle 6: We retain editorial responsibility for context and breadth of coverage where we provide personalisation

If journalism must provide a forum for public criticism and compromise, what role does personalisation — which gives each person a different experience of the story — play in that?

Some would argue that it contributes to ‘filter bubbles’ whereby people are unaware of the experiences and opinions of people outside of their social circle. But it can also bring people in to stories that they would otherwise not read at all, because those stories would otherwise have no relevance to their lives.

As data journalists, then, we have a responsibility to consider the impact of personalisation and interactivity both in making news relevant to readers, and providing insights into other dimensions of the same story which might not be so directly relevant.

This, of course, has always been journalism’s skill: after all, human interest stories are the ‘universal’ hook that often draws people in to the significant and important.

Principle 7. We should strive to keep the significant interesting and relevant by seeking to find and tell the human story that the data shines a spotlight on

For the same reason, we should ensure that our stories are not merely about numbers, but people. I always tell my MA Data Journalism students that a good story should do two things: tell us why we should care, and tell us why it matters.

Data helps us to establish why a story matters: it connects one person’s story to 100 others like it; without data, a bad experience is merely an anecdote. But without a human story, data becomes just another statistic.

Principle 8. The algorithms in our work – both human and computational – should be open to scrutiny, and iteration

The more that journalism becomes augmented by automation, or facilitated by scripts, the more that we should consider being open to public scrutiny.

If we are unable to explain how we arrived at a particular result, that undermines the credibility of the conclusion.

Diakopoulos and Koliska, who have explored algorithmic transparency in the news media, conclude that it is an area much in need of research, development and experimentation:

“There are aspects of transparency information that are irrelevant to an immediate individual user context, but which are nonetheless of importance in media accountability for a broad public such as fair and uncensored access to information, bias in attention patterns, and other aggregate metrics of, for instance, error rates. In other words, some factors may have bearing on an individual whereas others have import for a larger public. Different disclosure mechanisms, such as periodic reports by ombudspeople may be more appropriate for factors like benchmarks, error analysis, or the methodology of data collection and processing, since they may not be of interest to or even comprehensible for many users yet demanded by those who value an expert account and explanation.”

Principle 9. Sharing our code also allows us to work more efficiently and raise standards

buzzfeed github

It has often been said that transparency is the new objectivity in this new world of limitless publishing. This both recognises that while true objectivity does not exist transparency can help establish what steps we have taken towards coming as close as we can to it.

The AP Stylebook‘s section on data journalism has formally recognised this with its reference to reproducible analysis:

“Providing readers with a roadmap to replicate the analysis we’ve done is an essential element of transparency in our reporting. We can accomplish this transparency in many ways, depending on the data set and the story”

But what is transparency for data journalists? Jennifer Stark and Nicholas Diakopoulos outline principles from scientific research that can be adapted – specifically reproducibility and replicability.

Reproducibility involves making code and data available so a user can rerun the original analysis on the original data. “This is the most basic requirement for checking code and verifying results”

Replicability,  on  the  other  hand,  “requires  achieving  the  same outcome  with independent data  collection,  code  and  analysis.  If the same  outcome  can  be  achieved  with  a  different  sample, experimenters  and analysis software,  then  it  is  more  likely  to  be true.”

Currently the code-sharing site GitHub is the place where many data teams share their code so that others can reproduce their analysis. It is incredible to look across the GitHub repositories of FiveThirtyEight or BuzzFeed and understand how the journalism was done. It also acts as a way to train and attract future talent into the industry, either formally as employees, or informally as contributors.

Principle 10. We should seek to empower citizens to exercise their rights and responsibilities

new york times you draw it

The New York Times You Draw It challenges users to take an active role

The final principle mirrors Kovach and Rosenstiel’s: the obligation on the public to take some responsibility for journalism too. And it is here, perhaps, where data journalism has the most significant role to play.

Because where Kovach and Rosenstiel put the onus on the public, I believe that data journalism is well positioned to do more, and to actively empower that public to exercise those rights and responsibilities.

A New York Times interactive which invites the user to draw a line chart before revealing how close they were to the true trend is precisely the sort of journalism which helps users engage with their own role in negotiating information.

A tool which allows you to write to your local representative, or to submit a Freedom of Information request, is one which positions the reader not as a passive consumer of news, but as an active participant in the world that they are reading about.

In print and on air we could only arm our audiences with information, and hope that they use it wisely. Online we can do much more — and we’ve only just begun.

*You can read all three extended extracts from the book chapter under the tag ‘Next Wave of Data Journalism’ here.

Filed under: online journalism Tagged: algorithms, AP Stylebook, data churnalism, Jean-Marc Manach, Jennifer Stark, John Culkin, lawrence lessig, Michael Koliska, next wave of data journalism, Nicholas Diakopoulos, objectivity

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑