HEAT Security Blog

By the Numbers: US Healthcare Data Breaches

The last time we looked at healthcare data breaches was a couple of years ago, so I thought maybe we should take another look. A listing of breaches of unsecured protected health information (PHI) affecting 500 or more individuals is maintained, as mandated by the HITECH Act, by the US Department of Health and Human Resources (HHS).

Things have changed since our last look: the HHS database has grown from 36 to 435 PHI breaches which, while still not as large as the healthcare-related breach information collected by datalossdb.org, does give us an opportunity to better understand the data protection challenges being faced in the healthcare space. Overall, a few other things we learn from these data are:

  • It’s early days yet, but 2012 looks to be on a better pace than previous years.
  • Most states saw more than one breach – indeed, the median number of breaches reported per state was six – but some states seem to have been particularly hard hit.
  • Business Associates have an outsized impact on the reported data, contributing 60% of the PHI records lost or stolen.
  • Use of data encryption would have allowed healthcare organizations to avoid a substantial $1.48B or more in breach-related costs over the past 32 months.

So, let’s dig into the HHS database a bit:

Basic Stats

The 435 PHI breaches documented by HHS impacted 20,066,249 individual records. In fig. 1 below, we see how these were spread by size. While the average breach size was 46,129 records, the median comes in at 2,184 – which matches closely with the 2,575 average reported by the Ponemon Institute in their Second Annual Benchmark Study on Patient Privacy & Data Security (Dec-2011). Further, we see only five breaches came in at the minimum requirement (500 records) and “only” six came in at over 1M records – but those six were doozies, impacting a total of 11.8M records and included a single breach of 4.9M records.

HHS Data Breach Analysis
No. of HC Organizations and Records Impacted vs. Breach Size (range)

fig. 1


We’re now 32 months into the HHS breach database, since they started keeping records back in Sep-2009. Thus far in 2012 we seem to be doing OK, with less than 1M records lost. But looking at figs. 2 and 3 we see something interesting: the majority of breaches seem to occur in the last third of the year. In fact, the median number of records breached in the January – August period is about 80% lower than in the September – December period. I don’t have any explanations for this – it’s still early days for any conclusive analysis in this regard – but I’d be interested in hearing any theories!

HHS Data Breach Analysis
Records Impacted (by Year)

fig. 2

HHS Data Breach Analysis
Records Impacted (by Month)

fig. 3


Another interesting way of looking at these data is state-by-state. The last time we did this, only 15 states were represented in the HHS database, and California had a majority of the breach incidents at 27.8% but only 4.5% of records lost or stolen. It’s a very different picture now, as one might expect (see fig. 4). Only four states have not reported any healthcare data breaches: Hawaii, Maine, Rhode Island, and Vermont. On the other end of the spectrum, we have New York (28 breaches reported), Texas (36) and California (49). But the largest number of lost or stolen records were in Florida (2,512,845 records impacted), California (3,652,373) and Virginia (4,919,466).

HHS Data Breach Analysis
Incident Count and Records Impacted vs. Breach Size (on State per capita basis)

fig. 4

However, on a per capita basis, we see Virginia (62.41%) and South Dakota (52.84%) occupying the top (bottom?) two spots, followed by Utah and Puerto Rico in the 28% range, and then dropping off to New Hampshire and Tennessee (17% range), Florida and Massachusetts (13.5% range), New York (12% range) and California (10% range). The long tail is, obviously, in the sub-1% range with 24 states (not including our four no-breach states). It’s not quite Pareto’s 80 / 20 rule, but close: the states with the top-10 most breaches on a per capita basis accounted for 90% of the records lost or stolen.

HHS Data Breach Analysis
By State Per Capita Distribution

table 1

This is not meant to pick on poor Virginia – after all, the reason for the incredibly high per capita number is due to the Tricare breach involving 4.9M military personnel, retirees and dependents, most of whom probably do not reside in Virginia. In fact, the theft of an unencrypted back-up tape from the car belonging to an employee of the covered entity’s business associate, Science Applications International Corp. (SAIC), occurred in San Antonio, Texas. So obviously there’s something hinky (that’s the technical term) with the reporting structure. But that said, this geographic impact analysis might warrant some looking into: why are some states (seemingly) hit more often? why are some states (seemingly) impacted disproportionally harder? is this changing over time and, if so, why? [Sounds like a good master’s thesis topic to me.]

Business Associates

Another nugget of interest which we can glean from these data is the breakdown between breaches attributed to covered entities (CAs) themselves and to their business associates (BAs). As shown in fig. 5, only 96 of the 435 breaches included in the HHS database were attributed to BAs, but they accounted for the lion’s share of the records lost or stolen. Now, one of those was the aforementioned SAIC breach, accounting for 4.9M of those records (by the way, this is the largest breach in the HHS database) – but even if we ignore this outlier, we find that BAs only accounted for 22% of the breaches but almost half of the records breached (47%). Of course, here again we see that long tail: the majority of BAs only contributed a single breach to the rolls – in fact, 61 of the 96 breaches are tied to individual BAs while only 12 BAs had multiple breaches recorded in the HHS database (fig. 6).

HHS Data Breach Analysis
Impacted Organization (Incidents| Records)

fig. 5

HHS Data Breach Analysis
Business Associate (BA) Breaches (count | records)

fig. 6

Data Location

Looking at where sensitive PHI was stored when breached is always interesting. Here again we have interpreted some of the definitions used by the HHS (e.g., how are “computers” different from “desktops” and “laptops”), but lumping their 19 categories together into seven groups (fig. 7), we see that although “computers” and “other (physical)” constituted the majority of the breaches but not of the records lost or stolen – it turns out that the “other (electronic)” category resulted in the most number of records lost or stolen. [The “other (physical) category consists of paper and film records; the “other (electronic)” category consists of an unspecified “other” plus CDs, backup tapes, external hard drives and the like.] Of course, this “other (electronic)” category includes the single largest breach of 4.9M records – but even without it, this category still contributed the largest chunk of records breached. [Note that, since many breaches are coded with multiple locations, the sums here do not match with the overall totals.]

HHS Data Breach Analysis
Data Location (Incidents| Records)

fig. 7

Another way of looking at these data is: how many breaches could have been avoided had the data been encrypted. [See our new whitepaper entitled Healthy Solutions for Protecting Patient Data on the breach notification “safe harbor” for encrypted ePHI.] From this perspective, over 86% of the records lost or stolen would have been protected, resulting in a 70% reduction in reportable incidents. Applying the median cost of $240 per record breached in the US healthcare sector reported by the Ponemon Institute in their 2011 Cost of Data Breach Study (Mar-2012), it seems that the lack of encryption cost the breached entities somewhere in the $4.87B range. Even if we consider only the direct costs (~30% of the median overall cost reported by the Ponemon Institute), we still looking at a $1.48B hit on the affected entities. For the want of a nail ….

So, there’s you have it – this edition of the HHS PHI breach analysis. There’s still more meat on those bones, so leave me a comment if you have a question or are interested in something specific.

2 thoughts on “By the Numbers: US Healthcare Data Breaches

  1. Good morning
    Great article. We would like to request permission to use some of it on our website which is in development. Specifically the donut graph pertaining to business associates. Full credit and citation will be referenced.

Comments are closed.