In the U.S., the Health Story Project has estimated that we produce 1.2 billion documents related to clinical care each year. Big data experts generally agree that 60-80% of the clinical data in those documents can be found in the unstructured annals of patient records and physician encounter notes.
I know what you are thinking “Big data analytics, blah, blah, blah….” Well, what if I told you that most of the information about you is locked in an unstructured healthcare data dungeon and goes largely unanalyzed and unused in directing your personal health care? That the meaty information about your personal and family medical history exists in several different electronic systems, scanned images, pdf files and handwritten charts in up to a dozen or more locations? That despite spending more than $3 trillion a year on healthcare in the U.S., we do not have a single unifying system to enable easy access to all that data about you? Got your attention?
The fact of it is, accessing your unstructured health information and then putting it to use in managing your care is really, really, really hard. Really hard. Unstructured data is hard to process or analyze, even if you could find it all. Imagine you are a hiring manager, and you have asked a prospective employee to fill out a job application as part of the recruitment process. Job applications come in all shapes and sizes, but we work our way through them filling out the information asked for in each discrete box: Name, address, telephone number, previous employer, dates employed, etc. Reviewing the job application is easy because the information can be read out of order, and you can even read only certain sections that may be important for the position for which you are recruiting. That’s structured data: discrete and organized and easy to analyze.
Now imagine you are the same hiring manager, and you asked the same candidate to provide you the information needed to complete the application but provided them with no form to do it—and what you received back is a document with no headings, new paragraphs, periods or capitalizations to denote where one bit of information starts and the other begins. Even further, imagine the candidate completed the information in order of their preference, not yours. No spaces, just a giant document with letters and numbers with no way of knowing if the city you found was a place of residence, a person’s name, a place of employment or a reference’s address. That’s unstructured data—messy, disorganized and really hard to gain insights from.
Imagine how we could change the delivery and consumption of healthcare if we could tap into and analyze that messy, unstructured, majority of your health care information, that 80%. Care providers could have realtime access to our history of care and a deeper understanding of who we are, which could have a dramatic impact on providing user-directed or personalized care. It would allow for earlier interventions, matching of procedures and physicians, and improved outcomes.
As I mentioned earlier, making clinical use of that big chunk of healthcare data is really, really, really hard, but not impossible. Cognitive computing technologies such as IBM’s Watson and Apixio’s Iris platform have the power to extract, read and learn from that messy mountain of data. We are on the eve of a new data revolution in healthcare, which will improve patient care and quality outcomes.