NiemanLab reports: When it comes to automating the process of spotting breaking news, solving one problem can create several more.
Reuters discovered this firsthand over the past two years as it built Reuters News Tracer, a custom tool designed to monitor Twitter for major breaking news events as they emerge. While reporters curate their own lists of sources to get rapid alerts on stories they’re already looking for, the Reuters tool is designed to solve a different problem: detecting breaking news events while early reports are still coming in.
The development of the tool, which Reuters is speaking about publicly today the first time, emerged out of “an existential question for the news agency,” said Reg Chua, Reuters’ executive editor of data and innovation. “A large part of our DNA is built on the notion of being first, so we wanted to figure out how to build systems that would give us an edge on tracking this stuff at speed and at scale. You can throw a million humans at this stuff, but it wouldn’t solve the problem,” he said.
Once the tool identifies what it thinks are emerging stories, it clusters relevant tweets into events, generating information, and metadata about what that story might be about. Tweets that mention “explosions” and “bombs,” for example, would be clustered into a single story about a potential terrorist attack.
But detection is only the first, and probably easiest, problem to solve. Another challenge was figuring out how to identify which events are actually interesting, newsworthy, and not spam. Added to that is the problem of filtering out assertions of opinions (“I think it’s terrible that this event happened”) from assertions of facts (“This event happened”) and automating the processing of verifying whether reports are actually true.
The verification challenge was the most interesting and most valuable problem to solve, Chua said. Pulling from academic research on the verification of social media reports, Reuters designed its algorithm to assign verification scores to tweets based on 40 factors, including whether the report is from a verified account, how many people follow those who reported the news, whether the tweets contain links and images, and, in some cases, the structure of the tweets themselves. “Amazingly enough, a tweet that is entirely in capital letters is less likely to be true,” Chua said. [Continue reading…]