While partisans in the nation’s capital squabbled over which news outlets’ coronavirus coverage was most irresponsible, private digital detectives were scouring local news reports, plus web chatter and other online indicators, to map the pandemic’s spread in real time.

Local news is one of the more important signals amid all that digital noise, and even before this month’s economic freeze began wiping out newsroom jobs, data scientists had been worrying about the underlying insecurity of the news business.

Unlike maps that update daily with official death-toll and diagnosis tallies, HealthMap, headquartered at Boston Children’s Hospital, uses an automated process that sifts the social web round the clock to sniff out informal clues to burgeoning disease hot spots and then maps them as they are uncovered. Here’s a link to their newest animation of the COVID-19 outbreak. If accurate, that kind of up-to-the-minute intelligence can tell disease-fighting leaders where to send medical equipment or impose travel restrictions.

Surveillance has long been a core activity of public-health systems. Human sentinels across the country still notify the U.S. Centers for Disease Control and Prevention when a worrisome diagnosis is made, but a computational epidemiology system like HealthMap doesn’t wait. It takes a big-data approach, betting a mass of intelligently aggregated signals from social media can yield advance warnings.

Grossly simplified, here’s how it works. Combing the web, the HealthMap system looks for key words and other signals that pop up on social media, in search-engine queries and in local news reports, plus the posts of expert and official sources. Running all that scraped-up material through algorithms and other software tools that sort, sift and weigh, HealthMap revises its online maps to show what’s happening in real time, not after the bodies have been counted.

Local news plays an important role, HealthMap team members have said. News reports refine the informal signals found in the swamp of gossip, eyewitness postings and other data we leave in our online wake, sometimes called “data exhaust.”


So, the emergence of news deserts — localities that have lost or never had a local outlet — is a problem for disease surveillance. News reports, unlike most tweets or circumstantial evidence like parking demand near hospitals, tend to include authoritative and named sources and details like who, what, when and where.

Consider the Arkansas mumps outbreak of 2016. While there is a slow bureaucratic process for data file-sharing, computational epidemiologists seeing social-web clues to an outbreak quickly unearthed solid information from local journalists.

The Northwest Arkansas Democrat-Gazette did more than just retype official data on test results. As the number of cases rose to 3,000, the paper noted that the rate of parents refusing vaccines for their children were high in Northwest Arkansas and that, oddly, the region’s community of Marshall Islanders were getting the mumps despite having been vaccinated.

Those kinds of clues are gold for disease detectives, who then knew what to follow up. Maia Majumder, a systems engineer with a masters in public health, was a researcher at HealthMap at that time. Local news, she told the health-media company STAT, bolsters the quality of the signals processed by the algorithm. Her HealthMap colleague, Allessandro Vespignani of Northeastern University, echoed her concerns about the loss of that kind of reporting when they spoke to STAT reporter Helen Branswell for this excellent 2018 report.

There are limits to the accuracy of mapping systems built on machine learning, though. “I would just caution that you may want to point to Google’s now-defunct “Flu Trends” project,” said Jevin West, author of “Calling Bullshit: The Art of Skepticism in a Data-Driven World.”

“They got everyone excited, including me,” said the University of Washington data scientist. “The results were phenomenal,” he said, predicting flu outbreaks accurately. “But, within one and a half years it started to perform really poorly” and had to be shut down, he said. The problem? User behavior and word choice change rapidly online, plus social-media platforms constantly change their public-facing interface, which scrambles the signal. Unless a predictive model evolves in parallel, he said, it is doomed.


This is likely the busiest and possibly most consequential time ever for HealthMap, which might be why my more than 15 separate efforts over the last three weeks to reach Majumder, Vespignani and several other members of the team via email, Twitter, LinkedIn, phone and Boston Children’s press office have been unsuccessful.

I’ve built this report from HealthMap’s website, from peer-reviewed journal articles by HealthMap founder Jonathan Brownstein and from the report by Branswell, STAT’s award-winning infectious-diseases and global-health writer.

If Majumder, Vespignani or Brownstein surface for air and get back to me, I’ll either update this report or write a new one. The most pressing of my questions: How early did HealthMap spot the spread of COVID-19 and how have newsroom layoffs and closures since 2018 caused HealthMap to change the local news element of its tracking system? And how will they avoid the problems that plagued Google Flu Trends?

It was a wonderfully obsessive amateur blogger, Sharon Sanders at FluTrackers, who on New Year’s Eve first sounded the alarm to the Western world about a fast-moving and fatal new flu emerging from Wuhan, China.