How the Daily Dot Uses Data Journalism to Understand Tumblr, Reddit and the Role of the Influencer


Like the past few years at SXSW,  the buzz over the past week came down to what apps are hot and what are not. But the deeper questions will go unanswered for quite some time.  We do not know what the data says about the direction for new apps such as Highlight or Forecast.  We can only speculate how the community will shape the identity of the apps that got the buzz. Until then, the only ones who may have a clue are the technologists themselves. But that depends on how well they are studying their own data and the relative activity of the users, the churn rate and any other host of factors.

But there is a community emerging who are building new methods for reaching deep to find insights that are often buried under terabytes of data. They are data journalists and their work is ever more relevant in an age where scientific discovery is meeting the hundred year old practices of traditional journalism.

The traditional journalist is alive and well. They are bloggers, newspaper reporters and broadcast journalists. They write, edit and produce media in all forms. They develop sources. They talk to experts. Their stories are hand crafted.

The data journalists are a different breed.  They’re about as numerous as data scientists, meaning they are far and in between. They play a different role than the traditional journalist. At The Guardian, for example, data journalists get tasked with discovering patterns and trends in data from thousands of sources. During the riots last year in the United Kingdom, the Guardian team collected court records stemming from the arrests that were made.  They scanned and imported the data from thousands of sheets of paper. They placed the data it into a spreadsheet.  They used their developer skills to normalize it. They then had to find a way to visualize it.

There are a few media sources that practice this new craft. The New York Times and The Guardian are leaders in this new form of journalism. But the people they need are not easily found. It’s one of the few positions that the Times has to recruit for.

Nick White is editor-in-chief and CEO of the Daily Dot. I caught up with him at SXSW this past week. The Daily Dot is a data driven blog that depends on traditional and data journalists. They consider themselves in the “alpha stage” of data journalism. Its “Dot Leaderboard” scrapes data from Reddit and Tumblr.  Data gets crunched to discover the community’s influencers.

The goal is to explore how communities evolve beyond its most prolific contributors. They are looking at the patterns and cycles in how Web communities migrate. They want to find out what happens when people leave a community and how do web communities influence each other.

White says the Daily Dot and The Guardian have similar pains.

“You know what?,” White said. “The sick part of that is we have the same problem crawling Web pages. Getting Web pages in a data set is the biggest single X factor in cleaning the data.”

The data is a mess. Everyone is creating it themselves without any uniformity. But once mapped, the data gives the Daily Dot insights into the inner dynamics of how the Tumblr and Reddit communities are developing.

The Daily Dot team uses its data mining practices to review data from the top 100 Tumblr and Reddit accounts.

You can drill down into each account by individual. For example, RobottBuddha has a Dot score of 95 on the Reddit Dot Leaderboard. Metrics are broken down by comments and links.

“We discovered on Tumblr when we did this that reblogs are more powerful than likes,” White said.”They have more of an effect on the community.”

Services Angle

The Dot Leaderboard is in beta. How often have you heard a media company describe a service as in beta? But that’s a good way to look at what the Daily Dot is doing. It is seeking out new insights by exploring the data. And that aligns with what we hear from people like EMC Consulting’s Bill Schmarzo and the data scientist community. The best results come from taking a scientific approach. The service providers that view the world more as scientists will better understand the ways the community is moving.