Tracking (and analyzing) what we post on The Atlantic’s homepage

In the newspaper business, every reporter holds their breath during the “A1 meeting,” when top editors haggle over what should appear on the next day’s front page. The first page of the “A” section is a newspaper’s most important piece of real estate (besides the crossword). And given that picking what goes “on front” is probably the biggest editorial decision a publication makes on any given day, the rulings that come out of that meeting can either make a reporter’s afternoon (hooray, my story is going above the fold!), or ruin it (c’mon, Page B5 again?).

Our new homepage design has eight main curated slots, varying in size and purpose. But that flexibility can be paralyzing. Multiple times a day, our editors ask themselves the same questions:

  • What the heck should we put on the homepage?
  • Wait, didn’t we put that story there earlier?

Unlike a newspaper’s front page, a homepage has no printed record. That means coming up with the answers to these questions often relied on our homepage editors having good memories, or laboriously tracking articles. What’s more, we couldn’t say much about long-term trends — which types of stories tended to get more homepage love, or how often we switched things up.

A tentative solution

While our homepage is impermanent, that doesn’t mean it can’t be recorded. For example, we could hire an intern to write down the position of every story, every hour, all week long. That would be a pretty miserable internship!

Fortunately, it’s easy enough to code a web script to do essentially the same thing. Enter HomepageCreeper, a name I made up right now:

Every 10 minutes, HomepageCreeper puts itself in the reader’s seat and scrapes all the headlines on The Atlantic’s homepage, logging their position and URL. It drops these stories into a database, marking the times they appeared in a particular slot.

If the headline changes the next time HomepageCreeper comes around, that means a new story has taken over, and the previous story can be closed out.

Pretty simple! But having this scraper helps us in two ways.

First, our homepage staff has perspective about which stories we’ve highlighted before. That makes placement decisions easier, and provides a record to show editors.

Secondly, we’re collecting data that we could later parlay into a deeper analysis of traffic patterns on our site. The homepage still remains an important source of promotion for any given story; at some point, I’d love to pursue an analysis of how homepage placement impacts an article’s total reach.

And on my end, it’s been a neat reminder of how hard our reporters and editors work. Here’s all the stories that cycled through our lead slot on Monday, July 10, as the news regarding Donald Trump Jr.’s communications with a Kremlin-affiliated lawyer heated up:

That would be quite the A1 meeting.

Tracking (and analyzing) what we post on The Atlantic’s homepage was originally published in Building The Atlantic on Medium, where people are continuing the conversation by highlighting and responding to this story.

2017 Scholar Metrics Released

Scholar Metrics provide an easy way for authors to quickly gauge the visibility and influence of recent articles in scholarly publications. Today, we are releasing the 2017 version of Scholar Metrics. This release covers articles published in 2012–2016 and includes citations from all articles that were indexed in Google Scholar as of June 2017.

Scholar Metrics include journal articles from websites that follow our inclusion guidelines, selected conference articles in Computer Science & Electrical Engineering and preprints from arXiv and NBER. Publications with fewer than 100 articles in 2012-2016, or publications that received no citations over these years are not included.

You can browse publications in specific categories such as Ceramic Engineering, High Energy & Nuclear Physics, or Film as well as broad areas like Engineering & Computer Science or Humanities, Literature & Arts . You will see the top 20 publications ordered by their five-year h-index and h-median metrics. You also can browse the top 100 publications in several languages – for example, Portuguese and Spanish. For each publication, you can view the top papers by clicking on the h5-index.

Scholar Metrics include a large number of publications beyond those listed on the per-category and per-language pages. You can find these by typing words from the title in the search box, e.g., [allergy], [cardiología], [biomarkers].

For more details, see the Scholar Metrics help page.

Posted by: Anurag Acharya, Distinguished Engineer

New partnership with Clarivate to help oaDOI find even more Open Access

We’re excited to announce a new partnership with Clarivate Analytics!

This partnership between Impactstory and Clarivate will help fund better coverage of Open Access in the oaDOI database. The  improvements will grow our index of free-to-read fulltext copies, bringing the total number to more than 18 million, along with 86 million article records altogether. All this data will continue to be freely accessible to everyone via our open API.

The partnership with Clarivate Analytics will put oaDOI data in front of users at thousands of new institutions, by integrating our index into the popular Web of Science system.  The oaDOI API is already in use by more than 700 libraries via SFX, and delivers more than 500,000 fulltext articles to users worldwide every day.  It also powers the free Unpaywall browser extension, used by over seventy thousand people in 145 countries.

You can read more about the partnership in Clarivate’s press release.  We’ll be sharing more details about improvements in the coming months.  Exciting!

The post New partnership with Clarivate to help oaDOI find even more Open Access appeared first on Impactstory blog.

The Top 25 articles for May 2017 ranked by attention

At today’s GEN Summit 2017 in Vienna media research company Kaleida released its first Attention Score ranking of English-language news sources. The data is based on the open source Attention Index algorithm from Kaleida previewed for the first time this week.

Source: Kaleida’s Attention Index, June 2017

CNN topped the list with the article “The House just passed a bill that affects overtime pay” on May 4th by Julia Horowitz . The article was promoted on CNN’s home page for 33 hours which has an Alexa Rank of 105, and it was pushed via CNN’s brand page on Facebook with 27,386,540 followers. Facebook reported 85,276 engagements for the article.

Other major news stories included articles about Macron and the French Election, James Comey, and US healthcare.

The rankings were derived from data about 128,000 articles published by major US and UK publishers in May 2017. The data, methodology and algorithm are all available and open for reuse under a Creative commons license at:

About The Attention Index. The Attention Index is a proposed open standard for measuring premium media. It was developed by Kaleida, a media research and data company based in the UK. The data, algorithm and methodology are publicly available with a Creative Commons license.

About GEN Summit 2017. The Global Editors Network is a cross-platform community of editors-in-chief and media innovators committed to sustainable, high-quality journalism, empowering newsrooms through a variety of programmes designed to inspire, connect and share. The GEN Summit 2017 is gathering over 750 editors-in-chief and media innovators from over 70 countries

The Top 25 articles for May 2017 ranked by attention was originally published in Global Editors Network on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Swedish startup Kit is rethinking analytics for a broader view of what makes a story successful

Every time a staffer at the Swedish news startup Kit produces a story — no matter if it’s a Facebook video recipe for avocado hummus or a text story on Kit’s own website about coal-fired powerplants — they have to fill out 17 categories of metadata that the company developed to classify stories.

Those data points include 145 different classifications (for a total of 43 billion combinations) covering things such as the tone of the story (is it funny? Is it dry?) and the story’s intent (was it created to surprise the user? Is it supposed to explain something to them?)

Kit also collects more than 200 different output data points on every story, including time spent on the page, scroll depth, reach, engagement, and more, depending on the story’s format and the platform where it was posted.

The goal of collecting all the information is to create Kit Core, a taxonomy for editorial content and a more holistic overview of what makes a story successful, said Fredrik Strömberg, Kit’s cofounder and VP of product.

“We are trying to structure the editor part of this whole process,” said Strömberg. “A lot of data-driven editorial teams are looking at the subject: What do we want to write about? And there’s a a lot of data mining, data analysis, and text analysis engines that look at the content itself and say, well, shorter works better, or you should have seven images in there. We’re trying to fit in the space between the ‘what’ and ‘what came out of it’…Can I create the editorial assignment in such a structured way that somebody can receive this assignment and know what they are supposed to do?”

That’s the heart of Story Engine, the CMS that powers every aspect of Kit’s editorial processes — from story ideation through creation, publication, and distribution. Kit is primarily a distributed publisher, and everything it produces, no matter the platform where it is published, is created and distributed within Story Engine, which allows Kit to categorize stories in more nuanced ways and also optimize the content for each platform.

An example Strömbeg often shares compares two hypothetical boat accidents in the Mediterranean: One involves 12 Syrian refugees trying to reach Italy and the other includes 12 British tourists off the coast of the Gibraltar. Even though both stories could both be defined as accidents, they are dramatically different stories, he said.

“That would make a world of difference in how we cover that story, even though in a machine analytical way it’s the same thing,” Strömbeg said. “At the same time, how to cover the Olympics and the Nobel Peace Prize awards could be exactly the same…We’re trying to figure out — which is a bold claim — how to tell any story in the best way possible.”

The company uses a combination of the metadata staffers input and the analytics of how the stories perform to better understand how users consume different types of stories and content types on various platforms.

With video, for instance, virtually everything Kit creates is vertical or square ratio, and 97 percent of videos are created without sound, Peder Bonnier, Kit’s CEO and cofounder, said in an interview this spring in Copenhagen during WAN-IFRA’s Digital Media Europe conference. Kit measures viewer retention uses that information to offer guidelines to producers on how they should create a particular video — including tips on how the storyboard and script could be optimized, and showing when viewers tend to stop watching the video.

“Usually, you have the story — a video, article, or image — in your CMS and you measure performance on that story with some Javascript or Google Analytics,” Bonnier said. “Then you have a bunch of distribution items tied to that story: a Facebook post, a tweet, an Instagram post, or whatever. It’s difficult to tie those systems together, and the most insight you can get is: Maybe it’s better for us to publish this to Facebook on Thursdays. But we do all of this in the same system. We produce a job, we attach a bunch of categories to it, and then we produce a bunch of distribution items to the job and categorize those as well. It enables us to say things like: If you want [a story] to be read through, it should be distributed this way. If you want it to generate massive engagement, it should be generated this way.”

Kit was founded in late 2014 by Bonnier, Strömberg, and editor-in-chief and cofounder Robert Brännström. It began publishing in spring 2015, and now has about 30 employees, half of whom work on the editorial team.

The site received 50 million Swedish krona ($5.7 million USD) in funding from Bonnier Growth Media, the venture capital arm of the media giant Bonnier. (The three co-founders all previously worked for the company, and Bonnier is on its board.) Bonnier Growth Media owns 67 percent, with the co-founders and employees retaining the rest of the ownership.

Bonnier wouldn’t disclose how the company performed last year, though it lost 22 million krona ($2.53 million USD) in 2015 before it began generating any revenue, according to its annual report. Kit has been collecting editorial data since it launched, and last year it began offering access to its content insights to brands and advertisers.

“Our business is totally based on how to tell a specific story in the best way, and selling that insight back to advertisers,” Bonnier said.

Editorial staffers are encouraged to experiment so Kit can see how different types of stories perform and continue to build out content forms.

For example, staffers recently changed the process of how they produce recipe videos. Traditionally, it had started with an overview of all the ingredients before going into the step-by-step instructions for how to actually make the recipe, ending with a scene showing the final dish. But they decided to try removing the ingredients overview and jump just straight into the recipe.

A post shared by KIT Mat (@kit_mat) on

A post shared by KIT Mat (@kit_mat) on

The move improved retention rates by 25 percent, Brännström said.

“We don’t want to mainstream the content that we produce,” he said. “We don’t want to just keep repeating what we know works. In every vertical that we’re in, we try to put 20 percent of the effort into developing new stuff or testing out stuff so that we don’t get stuck doing the same thing over and over again.”

In Sweden, Kit is now focused on growing its commercial operations. Bonnier said he could imagine Kit starting its own site or partnering with an existing news organization in another country, but for now the company remains primarily focused on Sweden.

As Kit continues to build its business, it’s focused on ensuring that its staffers and clients have the language to discuss and understand the data and understand what it’s telling them.

“You have to choose what you want to do, but you also have to have this language for it. If you just want to drive volume, then you have your set of tools or insights to do that. If you want to drive retention, content completion, more quality aspects of content, then that’s different from what drives reach,” Strömberg said. “We’re trying to get people to understand that this is not about doing something 1,000 percent better tomorrow. It’s about doing it better every day, having a structure for doing stuff better instead of just lucking out.”

Designing a Faster, Simpler Workflow to Build and Share Analytical Insights


Data is critical to decision-making at The New York Times. Every day, teams of analysts pore over fine-grained details of user behavior to understand how our readers are interacting with The Times online.

Digging into that data hasn’t always been simple. Our data and insights team has created a new set of tools that allows analysts to query, share and communicate findings from their data faster and easier than ever before.

One is a home-grown query scheduling tool that we call BQQS — short for BigQuery Query Scheduler. The other is the adoption of Chartio, which our analysts use to visualize and share their results.

The result has been more analysts from more teams being able to more easily derive insights from our user data. At least 30 analysts across three teams now have almost 600 queries running on a regular cadence on BQQS, anywhere between once a month to every five minutes. These queries support more than 200 custom dashboards in Chartio. Both represent substantial improvements over our previous model.

What problems were we trying to solve?

This effort began when we migrated our data warehousing system from Hadoop to Google’s BigQuery. Before we built new tools, we worked with analysts to come up with several core questions we wanted to answer:

  • What patterns and processes did the analysts use to do their work?
  • Which of those processes could we automate, in order to make the process more hands-off?
  • How could we make it easier for our growing list of data-hungry stakeholders to access data directly, without having to go through an analyst?
  • How could we ensure ease of moving between business intelligence products to avoid attachment to eventual legacy software?

Until the migration to BigQuery, analysts primarily queried data using Hive. Although this allowed them to work in a familiar SQL-like language, it also required them to confront uncomfortable distractions like resource usage and Java errors.

We also realized that much of their work was very ad-hoc. Regular monitoring of experiments and analyses was often discarded to make way for new analyses. It was also hard for them to share queries and results. Most queries were stored as .sql files on Google Drive. Attempts to solve this using Github never took off because it didn’t fit with analysts’ habits.

The act of automating queries was also unfamiliar to the analysts. Although the switch to BigQuery made queries much faster, analysts still manually initiated queries each morning. We wanted to see if there way ways to help them automate their work.

Query Scheduling with BQQS

Before we considered building a scheduling system in-house, we considered two existing tools: RunDeck and AirFlow. Although both of these systems were good for engineers, neither really provided the ideal UI for analysts who, at the end of the day, just wanted to run the same query every night.

Out of this came BQQS: our BigQuery Query Scheduler. BQQS is built on top of a Python Flask stack. The application stores queries, along with their metadata, in a Postgres database. It then uses Redis to enqueue queries appropriately. It started with the ability to run data pulls moving forward, but we eventually added backfilling capabilities to make it easier to build larger, historical datasets.

A testing dashboard in BQQS

This solution addressed many of our pain points:

  • Analysts could now “set it and forget it,” barring errors that came up, effectively removing the middleman.
  • The system stored actual analytics work without version control being a barrier. The app stores all query changes so it’s easy to find how and when something changed.
  • Queries would no longer be written directly into other business intelligence tools or accidentally deleted on individual analysts’ computers.

Dashboards with Chartio

Under our old analytics system, “living” dashboards were uncommon. Many required the analyst to update data by hand, were prone to breaking, or required tools like Excel and Tableau to read. They took time to build, and many required workarounds to access the variety of data sources we use.

BigQuery changed a lot of that by allowing us to centralize data into one place. And while we explored several business intelligence tools, Chartio provided the most straightforward way to connect with BigQuery. It also provided a clean, interactive way to build and take down charts and dashboards as necessary.

One example of a dashboard generated by Chartio

Chartio also supported team structures, which meant security could be handled effectively. To some degree, we could make sure that users had access to the right data in BigQuery and dashboards in Chartio.

Developing new processes

Along with new tools, we also developed a new set of processes and guidelines for how analysts should use them.

For instance, we established a process to condense each day’s collection of user events — which could be between 10 and 40 gigabytes in size — into smaller sets of aggregations that analysts can use to build dashboards and reports.

Building aggregations represents a significant progression in our analytical data environment, which previously relied too heavily on querying raw data. It allows us to speed queries up and keep costs down.

In addition, being able to see our analysts’ queries in one place has allowed our developers to spot opportunities to reduce redundancies and create new features to make their lives easier.

Moving forward

There’s much more work to do. Looking ahead, we’d like to explore:

  • How to make it easier to group work together. Many queries end up being the same with slightly different variables and thus a slightly different result. Are there ways to centralize aggregations further so that there are more common data sets and ensure data quality?
  • Where it makes sense to design custom dashboard solutions, for specific use cases and audiences. Although Chartio has worked well as a solution for us with a smaller set of end-users, we’ve identified constraints with dashboards that could have 100+ users. This would be an excellent opportunity to identify new data tools and products that require the hands of an engineer.

Shane Murray is the VP of the Data Insights Group. Within that group, Josh Arak is the Director of Optimization and Ed Podojil is Senior Manager of Data Products.

Designing a Faster, Simpler Workflow to Build and Share Analytical Insights was originally published in Times Open on Medium, where people are continuing the conversation by highlighting and responding to this story.

Who’s really driving traffic to articles? Depends on the subject: Facebook (lifestyle, entertainment) or Google (tech, business, sports)

When you’re publishing to Facebook, or tweaking a headline to align with some carefully honed SEO strategy, how closely do you take note of story topic?

New research from suggests that news organizations trying to make the most of Facebook referrals and Google search traffic need to be extra discerning about story topic, as some — like lifestyle or entertainment — see the majority of their referral traffic coming from Facebook, while others — like tech, sports, and business — see the lion’s share of their traffic coming through Google search. (The findings were based on’s analysis of more than 10 million articles published last year by outlets within its network.)

Lifestyle articles, for instance, get more than 87 percent of their external traffic from Facebook, and just 7 percent from Google search. (63 percent of that traffic also came from a mobile device.) On the extremely Google-reliant end are job postings, which get 84 percent of their traffic through Google search versus 12 percent from Facebook. (There were significantly fewer job-related posts among the 10 million stories analyzed, 2,700 posts, compared to 110,000 lifestyle articles or 210,000 sports articles.)

Across the millions of articles analyzed, Facebook referrals accounted for 39 percent of external traffic, Google 35 percent. Other sources, such as Bing or Pinterest or Reddit, often made up less than a percentage point of referral traffic.

The report also breaks down what words often appear in stories from each topic (word cloud alert). Here, for instance, is U.S. presidential politics (hello, Drudge Report):

Here’s sports (where Twitter, at 10 percent, is actually a not insignificant source of referral traffic):

You can read the full report here.

Lifestyle Audiences Live on Facebook, Technology Readers Still Want Google Search

In 2016, on average, 40% of external referrers to the network of sites found the content via Facebook, 35% came from Google search, and the duopoly left the rest of the internet (including Google News) with a mere 25% of traffic referrals.

The quick and dominant rise of Facebook to media distribution powerhouse has been the focus of urgent and in-depth research from The Tow Center, Reuters Institute for the Study of Journalism, and Pew Research Center.

Our most recent data analysis shows, however, that if you use Facebook news feeds alone to judge what types of news people consume, you’ll end up with a distorted picture. When on Facebook, you’ll see readers especially engaged with articles on entertainment, lifestyle, local events, and politics. Articles on business, world economics, and sports also attract readers, but mostly through Google and other long-tail referrers.

What the data says about how audiences find different topics online

For our latest Authority Report, we wanted to move past aggregate platform trends. How does the audience referral network change according to article topic? To answer this question, we examined 10 million articles published in the network during 2016, categorized by topic.

In the 14 topics we examined, external traffic makeup varied significantly. Articles included in the “lifestyle” topic receive 87 percent of their external traffic from Facebook, whereas Google search generates 60 percent for articles in “technology.” Traffic from Twitter can make up from below 1 percent to 10 percent depending on the the topic.


Download the full report, with a breakdown of each topic.

Topics More Popular from Google Search

The leading category of Google Search referral goes to Job Postings, with 84.4% of external traffic referrals, though this category also accounted for the least amount of total posts.

Technology articles, Sports, and Business and Finance articles all performed above the Google Search average, at 61 percent, 50 percent and 47 percent respectively. Posts on the World Economy brought in 43 percent of the referrals from Google.

Sports articles also had the highest volume in any post topic category we examined.

Outside the Duopoly: Where Are Audiences Coming from?

The full report breaks down each topic, but shows a wide variety in sources outside the two monoliths.
For sports articles, Twitter makes up over 10 percent of the outside traffic referrals, while lifestyle articles receive less than 2 percent of their traffic from Twitter. Bleacher Report drives 5 percent of sports articles, while LinkedIn sends almost 5 percent of Business & Finance audiences. Drudge Report sends over 5 percent of traffic to articles about National Security, but less than 2 percent for articles about U.S. Presidential Politics.

Why does this data matter?

Finding the best audience for your work makes the most of the effort you and your team put into creating it. Understanding the nuances in audiences ensures that you consider the potential distribution while creating strategies and content, instead of waiting to find out how it does later.

See all of the data in the full Authority Report, and tweet #AuthorityReport to let us know how your audience compares! Fill out the form below to receive the full report.

The post Lifestyle Audiences Live on Facebook, Technology Readers Still Want Google Search appeared first on

Data-driven subscription audience growth

INMA 2017 Finalist

The Herald Sun – a daily metropolitan tabloid newspaper based in Melbourne, Australia – is a digital innovator. Over the span of a decade we have worked furiously to build a loyal digital audience on multiple platforms spanning mobile, desktop, apps and social channels.

Leading edge results have included:

  • Growing our overall paid audience through digital and print innovations;

  • Becoming one of the first metropolitan sites in the world to implement a digital subscription strategy in 2012;

  • Launching the biggest digital fantasy sports competition in Australia

  • Winning a host of major national editorial awards for digital news coverage.

We have maintained a significant readership despite an aggressive paywall strategy, with our overall digital audience the second biggest in Australia for a site operating such a model, and still among the 10 biggest news audiences in the country among mostly free news services.

Our newspaper also remains the single best selling daily in Australia.

How have we done this?

By knowing our audience and tailoring content to them.

In 2015, after two years as a metered paid content site, the Herald Sun instituted a freemium model with a 50/50 split between free content and stories requiring a subscription to read.

An issue we soon encountered was churn. Around half the subscribers that signed up decided they weren’t seeing value in the product.

In addition to our marketing efforts, we tasked editorial with engaging subscribers

Read full story

Web of Science partners with Digital Measures to drive real-time faculty activity reporting

Content integration through regular API calls save subscribers data entry time with seamless updates and improved accuracy.

For many academic institutions, keeping faculty activity and profiles up to date is a time-consuming data entry exercise. With the newly announced partnership between Clarivate Analytics and Digital Measures, Web of Science and InCites content is now integrated into Activity Insight™, Digital Measures faculty activity reporting software, enabling faculty profiles to be updated in real time. The integration uses API calls into the Web of Science content to keep faculty activity up-to-date and relevant.

Over a century of expansive coverage of the world’s most impactful research has brought the Web of Science cited references to over 1 billion, and still counting. This volume of references can be difficult for academic administrators to manage.

“We’ve made it a top priority to make data entry easier for our Activity Insight clients,” said Matt Bartel, CEO and Founder, Digital Measures. “Enabling faculty to integrate automatically with Web of Science, one of the largest and most respected citation databases, is evidence of our commitment to drive efficiency and save our clients considerable time. Faculty members are doing remarkable work, and this is a critical step in helping them share the stories of their accomplishments.”

With this partnership, current Web of Science subscribers have access to this full list of integrated content:

  • Arts & Humanities Citation Index
  • Science Citation Index Expanded
  • Social Sciences Citation Index
  • Book Citation Index
  • Emerging Sources Citation Index
  • Conference Proceedings Citation Index – Science/Social Sciences and Humanities Edition
  • InCites

“This partnership allows us to continue expanding into the addressable footprint of research workflow tools so existing and future customers can maximize their use of Web of Science content,” said Emmanuel Thiveaud, Head of Research Analytics at Clarivate. “We’re excited to continue forging valuable partnerships that enable mutual customers to effectively and efficiently meet their faculty management and assessment needs with trusted data and indicators.”

To find out more about how to directly import content and analytics from the Web of Science into Activity Insight, please contact either Jim Samuel at Clarivate or Stacy Becker at Digital Measures.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑