Intel Blog: Why is Big Data a Big Deal for Healthcare?

Posted by Bob Rogers on Thursday, 11 October 2012 

Leading up to their Healthcare Innovation Summit webcasts, Intel asked industry leaders to share some of their thoughts on the future of healthcare technology. As part of this blog series, they asked me to contribute on the topic of Big Data. You can read my guest entry on their blog, or right here:

As a healthcare technology leader, you ask, “Isn’t my data warehouse already Big Data?”

I hear this question frequently as Chief Scientist at a startup specializing in Big Data Analytics for Healthcare.

My response: “Probably not. ‘Lots of data’ is not the same thing as Big Data.”

“What is Big Data Analytics?”

Don’t get me wrong: Healthcare is generating Big Data. A typical healthcare system with 200,000 patients has 500,000 encounters each year, submits 5 million claims and creates 3 million documents. In five years this adds up to over 1.5 billion distinct references to medical concepts equaling 10 Terabytes of data. That is bigger than the entire print collection of the U.S. Library of Congress.

However, Big Data Analytics is really about new methods to infer knowledge directly from data, which requires three components:

1.    Scalable data storage with parallel computing capability

2.    Analytical tools, such as machine learning and statistical natural language processing, that can make sense of both structured data and clinical narrative 

3.    Instantaneous access to enough data to infer something useful in real time

“So what will I learn from Big Data Analytics?”

Here’s an example. To best care for diabetics, you enroll them into a disease management program, which means you must identify patients with diabetes accurately. You then need to compute quality measures, such as whether their latest Hemoglobin A1c lab value was above 9.

Sadly, your information system only contains structured data and you quickly find that this data is a mess. First, many patients with “diabetes” entered in their problem lists are actually not diabetic.  This is called “chart lore,” and is a phenomenon endemic to electronic healthcare data. Strike one.

Next, you discover that a significant fraction of your real diabetics do not have any coded term for diabetes in their problem lists so they are not being tracked for disease management. Strike two.

Finally, you discover that 25 percent of your patients are part of a provider group that your organization recently acquired. Although you loaded all of their coded data into your data warehouse, their lab codes are not recognized by your reporting system. Strike three.

There is no way you can actually compute the measures you need to run your business!

Here’s where Big Data Analytics comes in. With the ability to analyze unstructured data, you infer from encounter notes and consult letters which patients are truly diabetic.

What about the unrecognizable lab codes? You utilize machine learning to leverage a Big Data-sized set of patient histories to infer which of these mystery codes correspond to Hemoglobin A1c measurements. You are back in business.

Healthcare is at the intersection of two revolutionary events: The first is fueled by new Big Data technologies that extract valuable knowledge from huge amounts of data. The second was sparked by Meaningful Use and the electronic liberation of previously unavailable clinical data. The resulting explosion in Big Data for Healthcare will light the way for years to come.

What do you think?

Return to Apixio Blog