Big Data, Little Information, and the ACO Big Picture

By Michael Planchart, Perficient, Inc.

The Journey of a thousand miles towards an ACO begins with one step.
Healthcare organizations are coming to realize that the programs stimulated by the ARRA – HITECH Act, Meaningful Use (MU) and Accountable Care Organizations (ACO), require something that they don’t have in sufficient quantities, the desired type or in the right format: “Data”.

In this post we’re going to focus primarily on the ACO analytics side of things although some of the same principles are applicable to Meaningful Use at its various stages.

The Little Data We Do Have

Historically hospitals have focused on managing their data from the financial perspective. They are very good at submitting claims and receiving the reimbursements, or denials, and reconciling these. They are also very good at dealing with myriad payers which each have unique and complex processes and workflows to embrace. Government payers such as Medicare and Medicaid are very different to deal with because of complex rules that each of them has; Medicaid differs from state to state; private payers also have their disparities. Most healthcare organizations have created value based purchasing strategies that have nothing to envy the mammoth retailers. But all this data generated, stored and mined is similar to that of any other industry vertical. It’s business as usual here.

Hospital organizations have been relying on claims data for most of their financial and operational needs.

The current trend in healthcare is far beyond this type of data. Managing a patient’s health requires relevant clinical data. This is the data that is hundredfold more complex than any other industry has to deal with.

Folks that are, for the first time, entering the Healthcare Information Technology (Health IT) domain are a little perplexed and seem to perceive that we are years behind other domains. This is far from the truth. In the other verticals such as the banking, investment, retail or telecommunications ones, most of the data is of financial, logistic and operational nature. In healthcare we have to deal with this type of data as was indicated and with the other types that are not measurable with fingers alone, or an abacus.

Where’s the BIG Data

Laboratory information results are value and range based (e.g., normal, high, low), or binary (e.g., positive, negative), resulting from the chemical analysis and measurement of specimens (e.g., blood, urine, tissue); anatomical pathology results consist of the same in addition to complex interpretation narratives.

Medicines are discrete units that are being dispensed and administered (e.g., Metformin ER 500 mg tablet, Mupirocin Ointment USP, 2%) but also within a time frame, finite or infinite, and at precise intervals. And to add to the complexity; dosages may vary during the episode of care or an encounter in response to the patient’s reactions; allergies have to be taken into account; medicines may be changed; drug-to-drug interactions are evaluated prior to administering; diet has to be tracked and recorded; follow-up procedures or treatments have to be accounted for.

Imaging results from radiology contain images, discrete data, metadata and non-discrete narratives combined and packaged as a study. The non-discrete narrative is contained in report that is created by the radiologist while “reading” the images and recording into a transcription device or software which is converted from voice to text. A study can contain 1 or hundreds of images; a simple chest x-ray may contain 1-4 images (e.g., Posterior-Anterior (PA), Anterior-Posterior (AP), lateral (LAT)); a CT study may contain as many as 500 images each representing a slice.

We have complex coding systems: ICD-9 (currently migrating to ICD-10) for the classification of diseases, signs and symptoms, abnormal findings, complaints, social circumstances, and external causes of injury or diseases; LOINC for the classification of laboratory and clinical observations; SNOMED as an organized categorization of clinical terms, codes, synonyms and definitions of diseases, diagnosis and procedures; RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software; etc.

Hospitals also have their own reference coding systems that have evolved throughout the years.

When a patient arrives at a provider facility and the clinicians begin with the anamnesis, many events, manual or automated, may start occurring: insurance or Medicare/Medicaid eligibility is verified; laboratory, radiology and pharmacy orders are entered; laboratory and radiology results are generated; medications are ordered, dispensed and administered, sometimes with CPOE and sometimes not; scheduling is processed and resource availability is verified; registration, admission and transfer events are triggered; billing details are validated and recorded. Behind the scenes there are disparate systems “talking” to each other in several healthcare lingos: HL7, X12 and DICOM. Hundreds or thousands of messages containing data are going from here to there and vice-versa. All these messages are sending data that is being consumed by other systems or even other external organizations.

Then, if there is so much data why is there little data?

The answer is simple: an enormous amount of data or information generated that spans from the beginning of a patient’s anamnesis, through the evolution of the episode of care and until the end of the catamnesis, is not being collected, and if it is being collected then it’s being recorded in a format that is inadequate, difficult or impossible to mine (or extract).

But didn’t we just say in one of the above paragraphs that hundreds or thousands of messages containing data are being exchanged during an encounter?

The answer is yes, but the data that is being collected is only the tip of the iceberg of what is required for many of the use cases being envisioned and which are required to manage the population’s health that belong to an ACO.

For example, from the anamnesis the clinician obtains the chief complaint and tons more of information provided entirely by the patient that may have motivated the visit or encounter. The majority of the information being provided by the patient is subject to the interpretation of the physician or the nurse. Have you ever gone to two different doctors with the same ailment and received the same interpretation? I haven’t.

The physician and nursing notes are not being transcribed into the Electronic Health Record (EHR) of the patient mostly because many providers don’t have an Electronic Medical Record (EMR) system. Maybe the provider has an EMR but the EMR doesn’t capture the information in a discrete way. These documents might be scanned and stored in an image format.

You’ve mentioned it a few times, what in the world is an anamnesis? Good question, the anamnesis is the combination of the verbal narration and written information the patient provides initially during the first encounters and it may continue throughout the entire episode of care; and since the care of a patient can depend on other people than him/herself abundant data or information may come from a heteroanamnesis, that is where relatives or caregivers narrate and provide written information about chief complaint, family history, present illness, etc.

Thinking from the End

An ACO requires the following capabilities among many others:

  • Population Health Management (PHM)
  • Chronic Disease Management (CDM)
  • Disease Registries
  • Health Information Exchanges

These capabilities require tons of data or BIG data that should be collected by clinicians and other trained healthcare professionals and not by mere source systems communicating messages between themselves.

Most of the healthcare organizations have a very difficult time knowing what the Average Length of Stay (ALOS) is for their patients at each one of their facilities. Needless to say they believe that a re-admissions management system is something required to operate effectively. Do you have to manage re-admissions or do you just have to count them? You don’t manage re-admissions you avoid them!

How much data do you need to obtain results for these two trivial indicators? All you need is the patient identifying information and the admission and discharge dates for each episode of care. Of course, you could also get fancier and try to obtain the ALOS that corresponds to a particular physician or department. But still, this data is easily obtainable.

On the other hand the capabilities listed above require data that is not easily obtainable since many times it’s not even collected. In order to succeed you would have to determine what data elements would be required for each of the capabilities and then try to map these to the origins or source systems. Not too long ago I performed a mapping for Coronary Artery Disease (CAD) and it was a daunting task. My team and myself had discovered that 80% of the data elements had to be manually abstracted since they were contained almost entirely in scanned notes or even paper notes that had never been scanned.

Yet, thinking from the end and mapping to the source will help you discover the gaps in data that is required for each use case.

The Heterogeneous Curse

Most healthcare organizations choose the “Best of Breed” model for their various systems. What this means is that each application has its own database and typically they don’t share information among each other.

Even those healthcare organizations that have chosen a single vendor for most of their needs face a similar dilemma in that the vendors generally grow their offerings by acquisition of other smaller software companies. The end result is that although the systems are under one vendor’s umbrella they generally implement different technologies and interoperability among them is as challenging as in the “Best of Breed” model.

HL7 messaging, as explained above, has been able to get most of these applications to “talk” to each other. “Talking” alone doesn’t solve the problem of “actionable” data. “Actionable” data is a requirement for many of an ACO’s requirements.

The BIG Challenge Ahead

Getting to “actionable” data is key to overcoming the heterogeneous curse. This is the BIG challenge ahead.

Taking on this challenge one step at a time can help overcome the paralysis.

The most crucial step is creating an Operational Data Store (ODS) and an Atomic Data Store (ADS) from all the available historic data, whether archived or extracted from the source systems databases. Those organizations that have taken this step have been the ones that succeeded with Business Intelligence (BI), Clinical Intelligence (CI) and near real-time use cases.

The ODS/ADS combo will help aggregate the patients data. They will also be the precursors for the Extract, Transform and Load (ETL) layer.
Unfortunately, most hospitals treat the messages that are exchanged by the myriad of systems in a “consume and discard” fashion. Most of the messages navigate through the healthcare system going through a broker or interface engine. These messages get transformed or mapped and are pushed to the consuming systems which ingest the information they need. The messages may stay in the interface engine’s data store for a short period of time; typically between 15 to 30 days before they are deleted.

The next step is fomenting a cultural shift of the clinical staff. Clinicians have been reluctant to be data clerks and many have valid reasons. Fomenting the cultural shift is not changing mindsets of the clinicians. Enabling them with novel technologies to capture a patient’s health status at all critical points of the workflows will be the real game changer. Mobile technology, natural language processing (NLP) and voice recognition should become ubiquitous in the healthcare settings.

Leverage the CCD and other CDA based documents at each point of transfer of care. This requirement alone will be the major force to put in place all the necessary gear to get to an interoperable state.

Indirect requirements will start popping up: data governance will be mandatory, and so will coming up with well-defined terminologies and coding systems. Don’t let these dissuade you since they are all good.


To succeed in the future healthcare paradigm you must start immediately. Take one step at a time, have a BIG strategic picture of the future but act tactically now. You will get there, eventually.

Michael Planchart, aka @theEHRguy is an Health IT Interoperability Consultant, Enterprise Architect for Healthcare IT, Standards Specialist:HL7, DICOM, IHE. Android and iOS Mobile Health Apps designer.


Leave a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: