Journalists love stories. Give us a good anecdote and we know what our lead is going to be. We’re not as comfortable with data. We know a good story is hiding in there somewhere, but most of us don’t know how to find it. And too many of us — reporters and executives alike — are refusing to learn.
My first exposure to the use of data for journalism was when I was at the Kansas City Star (or possibly the Kansas City Times; I worked for both) nearly 20 years ago. The late Greg Reeves, a kind of geeky reporter I didn’t know very well but came to admire, wrote a terrific story about the driving records of Kansas City police. I don’t recall the details, but I was shocked at how many police had offenses such as reckless driving (I think drunk driving, too, but I can’t vouch for my memory over that many years). What I do recall is that I started to understand the power of data analysis.
In the mid-1990s, I attended two training programs of the National Institute of Computer-Assisted Reporting and learned a great deal from David Milliron, Shawn McIntosh and Jennifer LaFleur. I worked side-by-side on a project at the Omaha World-Herald with a reporter who was a data whiz, Carol Napolitano. I later bugged another World-Herald colleague who learned faster than I did, Paul Goodsell, for some advice as I struggled on data stories of my own.
Data work never came naturally to me (it doesn’t to many journalists), but I learned that the rewards were worth the struggle. I love a good story and using spreadsheets and other data tools helped me deliver some front-page stories that used the records of public agencies to contradict the public statements from leaders of those agencies. Every reporter loves that kind of gotcha story.
I learned that you could find great stories by asking good questions of data, just like you find some great stories by asking good questions of people (except that data didn’t lie as often). I began to argue that every reporter should learn to analyze data. I argued that the term computer-assisted reporting was as ridiculous as notebook-assisted reporting. Computers are a tool every reporter should learn to use and I lamented that our profession was turning it into a specialty reserved for a few geeks.
I saw more possibilities in using data to tell stories the first time I visited chicagocrime.org, a web site developed by Adrian Holovaty, a journalist who developed the site on his own time. He wrote programs to scrape the police reports of the Chicago Police Department every day and store them in his database. He wrote other programs to let visitors ask questions of the data and get helpful answers.
Let’s say you were considering buying a home in Chicago. You could enter an address and Holovaty’s database showed all the recent police reports in that neighborhood. Maybe you were visiting Chicago and wanted to know whether you should walk from your hotel to your business meeting: Enter both addresses and look at the police reports along your route and you can decide whether to take a cab or hoof it. If you’re an advocate for domestic-violence victims, you can search by a particular type of crime.
I began to see the possibilities for building interactive databases that let users look for answers to their own questions.
Not long after my first look at chicagocrime.org, I got a chance to hear Holovaty talk about it at the American Society of Newspaper Editors convention in Seattle in April 2006. He explained the need for journalism to have more people with programming skills to develop similar databases. In my Training Tracks blog for the American Press Institute, I wrote about the parallels Holovaty saw between a reporter pursuing a story and a journalist/programmer developing a database:
Just as a reporter gathers information for a story, Holovaty gathers information for his databases. Like a reporter distills information by deciding what’s worth writing about, Holovaty distills information by deciding which queries to offer users. As a reporter presents information by writing a story, Holovaty presents information by designing the web site.
Holovaty was innovation editor at the Washington Post then, developer of the Post’s excellent Congressional Votes and Faces of the Fallen databases. Any editor at the convention who didn’t see the importance and urgency of developing interactive databases wasn’t paying attention.
The industry has certainly improved and expanded its use of interactive databases since that convention:
- API’s Newspaper Next report later that year called for using databases to help anwer users’ everyday questions.
- That fall, Gannett made data one of the emphases of the “information centers” that replaced Gannett newsrooms.
- This spring, PolitiFact, a fact-checking political database developed by Bill Adair and Matt Waite (one of many journalists who are learning programming skills) of the St. Petersburg Times, won the Pulitzer Prize for national reporting.
- I documented the increasing use of databases, first for the N2 Blog and later in a Newspaper Next report. (I suggested calling them answerbases, because the people who use them are really looking for answers and many of them feel unfamiliar with databases.)
- A Knight News Challenge grant provides scholarships for programmers to study journalism at the Medill School at Northwestern University.
- Some newspapers have hired programmers for newsroom positions, teaching them journalism skills and principles. I spent a week in 2007 working with and learning from such a programmer/journalist, Aaron Ritchey of thenewstribune.com in Tacoma, Wash. (I wrote about Aaron in my N2 report.)
- Here at Gazette Communications, Zack Kucharski has developed a strong set of answerbases for the Local Knowledge section of gazetteonline.com.
That’s the glass half-full. But we have been moving too slow to embrace and explore the possibilities of databases. Data too often is an afterthought, still a specialty, handled by those people in the geek ghetto. As an industry, we haven’t really developed the possibilities for using databases as a tool to attract and support new revenue streams (I addressed this in my report for N2).
A couple years ago, Holovaty left the Post when he received a Knight News Challenge grant to develop EveryBlock. This took the chicagocrime concept of microlocal mapped information and expanded it to every type of geocoded data Holovaty’s team could find: liquor licenses, building permits, restaurant inspections, property sales, street closures, photos, news stories and more. And it’s expanded from Chicago to 15 cities: Atlanta, Boston, Charlotte, Dallas, Detroit, Houston, Los Angeles, Miami, New York, Philadelphia, San Francisco, San Jose, Seattle, Washington.
The Knight grant ended in June and Holovaty released the code to the public. Anyone can use his techniques now to develop that kind of local answerbase. This may be the best resource to provide customized answers at the local level in the history of journalism. EveryBlock would be a tremendous vehicle to tap into what Michael Gluckstadt of Fast Company calls the “$100 billion local-advertising pot of gold.”
The newspaper business watched the potential gold mine develop. And yesterday, the newspaper business watched EveryBlock join forces with MSNBC.
That’s probably a great match. Instead of dealing with the glacial pace of newspaper organizations, Holovaty has hooked up with a national news force that’s focused on the web. Perhaps local newspapers will be able to affiliate with EveryBlock, gathering the local data and selling local ads. Or perhaps local NBC affiliate web sites will play that role, representing another lost opportunity for newspapers.
Newsosaur blogger Alan Mutter wrote that MSNBC “scooped” the newspaper industry, asking, “How did newspapers lose EveryBlock?”
We lost EveryBlock because we’ve been slow at understanding the importance of data for years.