Policy and implementation challenges to achieving big data outcomes (part 1)
April 29, 2013 in Medical Technology
“Big data” must be near the top of its hype cycle by now. As with other technologies, it may eventually deliver on a great deal of this hype, but the outcomes will probably come later than the current frenzy would suggest.
Part of the delay is that “new” technologies, such as big data, are frequently restrained by “old” policies and the “old” approaches of existing technologies. It takes time, and sometimes policy and utilization changes, to fully accommodate a new technology’s potential. This two-part series of articles will point to key places in health policy and data use where current approaches may be impeding full big data outcomes.
Knowing big data when you see it
The term “big data” is being applied to many different things now, but exactly what is included is not always clear. One way that big data is defined is by the use of specific tools, such as the Hadoop framework, that are needed to practically deal with extremely large data stores. But while this is a convenient way to define things, it is also a somewhat circular definition. What is more, it does not really speak to the changes in approach and the differing utilization considerations that are involved in taking advantage of huge stores of data.
Specifically, big data tools facilitate pulling together great amounts of available data to support an objective whether those data were recorded specifically and narrowly for that objective or not (in health, sometimes they are called “secondary” data if they were recorded initially for clinical care purposes but then used for something else). Sometimes the data are simply a convenient surrogate for more specific data that are much harder to collect (Google searches as a surrogate for influenza reporting). Sometimes data are recorded in much greater detail than previously because the constraints of managing such great quantities of data are reduced (such as with physiologic monitoring data).
At times there also may be valuable “signals” in the data we did not collect before. We simply may not have known that the “junk” data were not really “junk” (such as the majority of DNA, not used for direct protein synthesis, but which has been recently discovered to modulate gene expression).
Some of these data may also have been recorded in less than ideal ways from a data analysis standpoint. They may be very raw, may be in unstructured form (such as narrative text), or may be in any of multiple different electronic formats (video and audio files, document images, etc.). In health, these format considerations are critical because there are so many ways that information is recorded in clinical care (imaging devices, sensors, software systems…) and because the health industry continues to struggle to get even a fraction of its information into standardized formats.
All of these variations involve aspects of big data that are about using increasing amounts of data you can get instead of getting the exact data that you think you need. For our purposes, we will call new ways of looking at the data that you can get “observational.”