Study: Wikipedia Tracks Flu Better Than Google, Faster Than CDC
April 21, 2014 in News
Researchers using data collected from Wikipedia searches were able to track influenza outbreaks better than the Google Flu Trends application, according to a new study by Boston Children’s Hospital researchers, Bloomberg Businessweek reports (Brustein, Bloomberg Businessweek, 4/17).
For the study, researchers developed an algorithm that collected data on how many times 35 flu-related Wikipedia pages were viewed.
The researchers collected page view data from between Dec. 10, 2007, through Aug. 19, 2013 (Baum, MedCity News, 4/18).
The study found that data garnered through Wikipedia:
- Accurately estimated the peak flu week in three out of the six examined flu seasons (Blaszczack-Boxe, Huffington Post, 4/17);
- Identified the weeks with the most flu-related activity with 17% more accuracy than Google Flu Trends;
- Was more likely to be correct about flu intensity on any individual week than the Google tracker; and
- Identified flu trends about two weeks sooner than CDC data.
The researchers also said that the Wikipedia data — unlike the Google tracker — were open to all potential researchers.
However, the study identified several limitations to using Wikipedia to track the spread of the flu. For example:
- Wikipedia does not record location information — unlike Google Flu Trends — so the algorithm could only identify when the flu is spreading nationally;
- Researchers used old data in their study, and the use of Wikipedia page view data has not yet been tested in real time during a flu season; and
- Both the Wikipedia data and the Google Flu Trends data could only identify correlation, not causation.
According to Bloomberg Businessweek, the researchers are working to address the lack of causation data by integrating their big-data findings with small-data techniques. Specifically, they are using the online polling site Flu Near You, which shares flu data about nearby people when users agree to tell the site how they currently are feeling (Bloomberg Businessweek, 4/17).
Details of Accuracy
The researchers hypothesized that the Wikipedia findings were more accurate than Google’s data.
They said Wikipedia searches were less likely to be skewed by people who were not sick themselves, but were interested in finding more news on a general flu outbreak.
Although the researchers did not test this hypothesis in their study, they suggested that users on Wikipedia are more likely to seek information about symptoms or medications, rather than news information (Feltman, Quartz, 4/18).