I have added three updates, marked in bold, since posting this originally.
Aggregation has become a dirty word in much of journalism today.
Actually, aggregation has a long, proud and ethical history in journalism. If you’re an old-school journalist, don’t think Huffington Post or Drudge when you think about aggregation; think AP. The Associated Press is
primarily largely an aggregation service*, except that it its members pay huge fees for the privilege of being aggregated (and for receiving content aggregated from other members).
The New York Times and Washington Post also have long histories of aggregation. In my years at various Midwestern newspapers, we reported big local and regional stories that attracted the attention of the Times, Post and other national news organizations. Facts we had reported first invariably turned up in the Times and Post stories without attribution or with vague attribution such as “local media reports.” I don’t say that critically. When I was a reporter and editor at various Midwestern newspapers, we did the same thing with facts we aggregated from smaller newspapers as we did regional versions of their local stories.
My point isn’t to criticize these traditional newspapers, just to note that aggregation isn’t a new practice just because it’s a fairly new journalism term. It’s one of many areas where journalism practices and standards are evolving, and I believe standards are actually improving in most cases.
After the Washington Post case, Elana Zak asked me and others if journalists needed to develop guidelines for aggregation.
It’s time for newsrooms to start creating guidelines for aggregation. If we have them for social media, why not for aggregating news?
— Elana Zak (@elanazak) April 21, 2012
— Elana Zak (@elanazak) April 22, 2012
I’m happy to contribute to that conversation with some thoughts about aggregation. I’ll start with discussing what I mean by aggregation (and its cousin or sibling, curation):
Aggregation and curation are techniques of using content from other sources to provide content for your audience. They occupy overlapping spaces on a spectrum with original reporting at one end and mechanical aggregation at the other.
Totally original reporting, where a reporter gathers every bit of content directly, observing events and interviewing firsthand sources, is probably a minority of content, even in traditional news organizations. Most stories, I think, draw on archives or on previous reporting from other sources. At the other end of the spectrum are Google News, Trendsmap and other mechanical aggregation, providing increasingly relevant content guided by your search terms and/or by popularity.
Curation lies somewhere in between, with journalists using judgment to compile content from other sources, sometimes blended with original content. Curation nearly always adds value through context, relationships, background or impact.
If that sounds a lot like traditional journalism, I agree. As NPR’s master curator Andy Carvin told The Atlantic: “I think curation has always been a part of journalism; we just didn’t call it that.”
Other spots on the spectrum are plentiful and reflect a mix of traditional and new practices: original reporting supplemented by curation of some background content from your own archives; aggregation of data from public records, with value added by your original analysis and/or reporting; mechanical aggregation supplemented by geocoding, tagging and other efforts to target content to specific audiences; an aggregation/curation blend that adds some value to aggregated content by adding hand-curated links or supplemental content; automated curation efforts to provide content that’s related to an original or aggregated piece through semantic data mining.
So here are my aggregation suggestions:
Link. Always link to source material. Whether you are a reporter working on a story that’s mostly original or an aggregator whose job is to collect content from various sources, linking has become a fundamental principle of journalism ethics. I can’t think of a reason not to link in digital content.
Attribute. When you use content originally reported by someone else, you should attribute when you know you weren’t first. Period. Neither paraphrasing nor doing additional reporting excuses failure to attribute. This is not simply a matter of professional courtesy; it provides clarity and understanding for readers/viewers who increasingly use multiple sources of information. Attribution helps consumers evaluate the reliability of information. It helps the consumer who follows multiple sources of information determine what is new and track content from multiple sources.
Attribute and quote before pasting. Copying-and-pasting sometimes gets a bad rap as a lazy journalism practice (it’s sometimes cited as an excuse by plagiarists, and was cited in the Post situation Pexton was writing about). Copying-and-pasting is actually an excellent practice to ensure that you quote accurately. You don’t inadvertently drop a “not” in typing the quote or change a “not” to “now” or a “murder trial” to a “murder trail.” But this needs to be a three-step process: copy-attribute-paste, in that order. Before you paste a quoted passage of even a few words into your story, blog post or video script, write the attribution and place the quotation marks (or open a block quote). That’s what I did above with the Keller and Pexton quotes: I wrote the attribution and added the links and quotation marks, then pasted the quotes right into the quotation marks. When you copy-attribute-paste, you can’t forget to add the attribution later and you can’t get confused about what to attribute to whom or what came from your notes and what came from sources.
Make an attribution check. Craig Silverman recommends use of an accuracy checklist (embedded below) that includes checking attribution before you turn a story in or publish it (I developed my own checklist at Craig’s suggestion). This may be the most critical check you make before publication. An error in fact, serious as that is, results in a correction and perhaps a rebuke from your editor and some tarnish on your reputation. But a single error in fact generally won’t cost you your job or wreck your career. One instance of plagiarism can get you fired and wreck your career. No one will believe your excuse about being lazy or rushed or distracted or confused or sloppy. Plagiarism is one of journalism’s capital crimes, so you should always double check all your material to be sure that it’s attributed before hitting “publish.”
Aggregation and curation work best when they add value to other material. You can add value in a variety of ways:
Original reporting. Those occasions when the New York Times swooped into Iowa to report a story the Des Moines Register had already reported (or the Register swooped into an Iowa town on a story already reported locally), the second news organization does its own reporting. It may talk to some of the same sources as the first organization or it may find new sources. The story may be heavily original reporting, but use some previous stories (from other news organizations or your archives) to provide background and context.
Data analysis. Much computer-assisted reporting is a form of aggregation. It’s often an aggregation of public records rather than news reports. But it’s generally content someone other than the reporter gathered. The reporter adds value by analyzing the data and/or by presenting it in a form where people find answers to their own questions.
Commentary/analysis. An important form of aggregation is when you use the curated content as the starting point for a commentary, as I did in my reaction to the faulty (but widely quoted) 2010 study of Baltimore’s news ecosystem by the Project for Excellence in Journalism. Or you can aggregate various related pieces of content and provide your own analysis and commentary, as I did last week in explaining how journalists and newsrooms can use Pinterest.
Find and filter. When we launched TBD in 2010, we had an aggregation team whose job was to find the best content about the Washington metro area and tag it for our audience. The aggregators didn’t actually create stories on our website, summarizing or quoting their content. We would have a directory (and home-page widget) for each ZIP code in the metro area, with links to all the stories that might interest you if you care about that ZIP code. Except for featured stories on the home page, which got a blurb, the headlines and links were the only content from those stories on our site. For instance, in a particular day, my Herndon, Va., ZIP code might have links to content from the Washington Post, Herndon Connection, Herndon Patch, some blog posts relating to Herndon and Fairfax County press releases. The value we provided was to gather all the Herndon content in one place.
Supplement. The TBD aggregation approach changed several months after launch. Instead of pulling in as many headlines and links, which all led away from our site, we began aggregating fewer stories but supplementing them. We might have a paragraph or two summarizing or quoting from the story we were aggregating. Then we’d add value by linking to earlier stories on the same topic or embedding related tweets, videos or maps.
Related stories. If multiple news organizations are covering the same story, you can provide aggregation value by pulling them together and noting the different or similar approaches.
Roundup. Storify has given us an excellent tool for roundups of various related content, such as tweets, Facebook updates, photos, videos, news reports, blog posts and background material. The organization adds value. pulling related things together from many sources and organizing them by topic, chronology or some other way. A roundup’s content can be loosely related, such as my post this morning on three work examples by my Digital First Media colleagues.
Update, May 19: This section appeared originally in a blog post expanding on the points I made here. I add it here for archival purposes, to put all my aggregation advice in a single post:
I don’t think an aggregator needs to verify every point from a source you aggregate from. For instance, in yesterday’s post, which aggregated several links, I did not verify that Media General sold 63 newspapers to Warren Buffett. I had seen the number in several other pieces I had read and I used it in my aggregation of Dan Conover’s blog post about the purchase without verifying the number from the Media General announcement or the Media General website. I also didn’t check Dan’s math on the average cost for each of the newspapers, though it looked right using round numbers in my head.
I do think aggregation requires some assessment of the trustworthiness of the sources you’re aggregating from. If you trust the sources, attribute to them and link to them, I think that should suffice. Taking the time to independently verify every fact from sources you attribute to would limit how much you can aggregate. Just as aggregation has value, I believe trust has value and the work of other journalists and news sources has value. If you’ve attributed to a trustworthy source, I think you can aggregate without independent verification.
For instance, most daily newspapers aggregate their daily wire report from the Associated Press and other wire services without independent verification because we trust the AP.
If you have doubts about information you are aggregating, you should verify those facts or express your doubts. Andy Carvin’s techniques of crowdsourcing verification should certainly be part of the aggregator’s toolbox.
When you are aggregating tweets from the scene of a breaking news event, you should note where appropriate what you have verified and what you haven’t.
A broader view of aggregation
I preceded my guidelines by noting the aggregation practices of traditional news organizations. But let’s take a broader view:
The Bible might be our best-known work of aggregation: a collection of books and letters from multiple sources.
Public libraries are longtime aggregators. They aggregate a wide range of books that might interest people in their communities (or universities, schools, newsrooms or law firms) and then multiple people can borrow the books without actually paying anything to the authors and publishers.
Looked at narrowly from the sense of whether a library patron is paying the author and publisher for the content she is reading, that’s a systematic ripoff of publishers and authors. But publishers and authors have benefited greatly for generations from the existence of libraries. A new author hopes desperately that thousands of libraries will be buying her book. Some people who borrowed one of an author’s books from the library will become faithful buyers of subsequent books. Someone who borrows the book from a library might suggest it for a monthly reading by a book club, spurring several sales for the author. Generally speaking, libraries boost book sales and digital news aggregators boost traffic to content-generating sites.
Society benefits from the existence of libraries, which boost reading and literacy. An author or publisher whining about the freeloaders who use libraries would appear selfish and narrow-sighted, just as journalists whining about aggregators are selfish and myopic.
The people who are critical or dismissive of aggregation and curation fail (or refuse) to understand the value that aggregation and curation have in themselves. Pointing out things that people may be interested in has value. It’s a great function of Twitter, Facebook, Pinterest, Google+ and other social media, which together drive huge traffic for news organizations.
And, as with the journalism examples I cited, social aggregation and curation has a long history, too. For decades, every letter from my mother included news clippings that she thought might interest me: commentaries or news stories on political, media or religious issues, embarrassing typos, comic strips. In a lot of ways, it was like Facebook, except that I didn’t have a “like” button.
As Alzheimer’s disease sapped Mom’s memory, she continued aggregating, even if she couldn’t remember to send things to me. When I would visit and sort through the stacks of stuff on her desk, I would find clippings with sticky notes labeled “Steve.”
In every case, the originating news outlet got no direct economic value from Mom’s persistent aggregation for me. (But the joy she got sharing with family and friends added value to Mom’s subscriptions, and perhaps her sharing had some brand-building value for the publications.) There was no economic loss to the publications, as I couldn’t run out and buy the Chicago Tribune or Kansas City Star when Mom lived in those cities and I didn’t. And I wouldn’t have been likely to pick up a Time magazine just because Mom recommended a story.
Even as her memory faded, Mom was meticulous about giving credit. If the clipping didn’t show the name and date of the publication, Mom added it in her perfect cursive script. She wasn’t a journalist, but she understood that aggregation and attribution go together.
*Yes, I know that AP produces original content. So does Huffington Post. In fact, both won Pulitzer Prizes this year. But both also heavily aggregate as well. If the analogy isn’t perfect, at least it’s fair.
Update, May 18: I was challenged by Erik Wemple in an email and on Twitter on the statement that AP is “primarily” and aggregation service.
— ErikWemple (@ErikWemple) May 18, 2012
Part of my reply to Erik:
I noted that the AP also produces original Pulitzer Prize-winning content. But I have been an editor and reporter using the AP’s service in five different states, and I feel very confident in saying that a huge amount of AP content (increasingly so as it has cut its state staffs) comes from its members. I know the state report is heavily aggregation and many of the national stories are state stories of national interest. Obviously the world report is heavily, if not entirely, original, but the importance of the world report to members is diminishing. I also have observed that national stories produced by AP staffers draw heavily on members’ coverage.
I believe if someone analyzed AP content, especially weighted toward the members’ use of the content, my statement that AP is “primarily” an aggregation service would stand up. Since I haven’t done and don’t plan to do that research, I decided to soften to “largely.”
I’ll end with this point: I don’t think that’s a bad thing. A main point of this post is to honor aggregation as a valuable service.
Updated again, May 18: Erik has posted on his Washington Post blog about this post and AP’s disagreement with it (no one from AP has responded to me directly, and I don’t know if the person responding read the full blog post or just the passage Erik sent).
Figuring that this assessment of the Associated Press has to be infuriating the Associated Press, I contacted the Associated Press, whose spokesman gave me this repudiation: “This claim is ridiculous. Of the 3,700 staffers of the AP posted around the world, two-thirds are gathering news for all formats.”
(In an earlier version of this blog post, I included that quote because Erik had emailed it to me. I mistakenly said I got it from the AP, rather than Erik. I removed it as soon as Erik pointed out the error to me.)
After noting that I had changed “primarily” to “largely,” Erik got a further response from the unidentified AP spokesperson (why unidentified, Erik?):
As a cooperative owned by its roughly 1,500 member papers, the AP picks up a number of member stories each day and shares them on our state news wires while noting the stories’ origins. A tiny percentage of these (less than 2 percent), typically exclusives and also credited to the originating papers, end up among an entirely separate selection of international, national, business, entertainment and other stories that AP licenses to commercial customers such as Google News and Yahoo News and thousands of other websites.
What’s more, AP offers ways for its member papers to share their stories with one another.
Erik’s conclusion underscores the point in my lead, that aggregation has become a dirty word in journalism:
If there’s any salience to the Buttry-AP thing, it’s that aggregation — at least as it’s known in Internet journalism — hasn’t yet matured into a respected enterprise. Saying that a news organization does a lot of it remains an accusation to be refuted.
I wish AP and/or Erik would have acknowledged that I discussed a wide range of aggregation practices, including original reporting that builds on your own archives or earlier reporting by other organizations. I am certain that a large amount of the reporting by AP or any other news organization falls in that category. The point of my post was not to “accuse” AP of aggregating, but to note that a lot of good journalism by AP and a lot of other organizations includes aggregation. That’s a fact, not an accusation.
The AP spokesperson’s reference to licensing to other websites indicates that Erik and the spokesperson were thinking of the sinister depiction of aggregation that I was trying to debunk, rather than reacting to anything more than a single sentence of my blog post (and the elaboration on the asterisk). The service that AP performs to its members, who get those state news stories that the spokesperson acknowledges, is aggregation. That’s exactly what I was talking about.
AP’s aggregation practices have changed in recent years (member organizations didn’t used to get credit), but that doesn’t mean it doesn’t do heavy aggregation. It does, and for the large membership fees it charges, it better keep doing so.