Study: Methodology Changes Improve Google Flu Trend Accuracy
July 7, 2014 in News
The accuracy of Google Flu Trends‘ disease surveillance system can be improved through simple changes in three different methodologies used by the system, according to a new study published in the American Journal of Preventive Medicine, Health Data Management reports (Goedert, Health Data Management, 7/7).
In 2008, Google created Google Flu Trends, which uses search algorithms to track flu activity based on individuals’ searches for flu-related terms.
After comparing its data with traditional flu surveillance systems, Google found that it could estimate the spread of the flu by tracking the influx of flu-related search terms.
However, a report released in March found that flu tracking data gathered through CDC are far more reliable and accurate than information garnered through Google Flu Trend, despite the time lag present in the federal findings.
According to the March study, Google Flu Trends showed that 11% of U.S. residents had the flu during peak flu season for 2012-2013, while CDC reported that 6% of the population was affected (iHealthBeat, 3/14).
For the new study, researchers at San Diego State University, Harvard University and the Santa Fe Institute identified three existing problems with Google Flu Trends’ methodology:
- The combination of multiple queries, which ignores variations in individual query tendencies over time;
- Certain search queries are excluded based on opinion rather than evidence; and
- The model is static.
Researchers noted that their alternative approach was “inspired by data-assimilation techniques, supervised machine learning and artificial intelligence.”
By tweaking the three methodologies, researchers identified as problematic, they were able to reduce the 2012-2013 prediction from 10.6% to an estimated 7.7%, significantly closer to the reported percentage of the population affected.
Researchers stated that their new methodologies all improve Google Flu Trends’ transparency, and specifically:
- Expand on current Google Flu Tracker approaches to allow for multiple search queries;
- Empirically select search queries, which help maximize the accuracy of predictions in real time; and
- Expand on Google Flu Trends’ current use of manual revisions through updates on how individual queries predict flu each week (Health Data Management, 7/7).
Researchers concluded, “[Google Flu Trends] may be inaccurate, but improved methodologic underpinnings can yield accurate predictions,” adding, “Applying similar methods elsewhere can improve digital disease detection, with broader transparency, improved accuracy, and real-world public health impacts” (Santillana et al., AJPM, 7/1).