They Said What?

Home » Measuring ROI » How to Plausibility-Test Wellness Outcomes (not as boring as it sounds)

How to Plausibility-Test Wellness Outcomes (not as boring as it sounds)

Do you know whether heartburn pills are safe for long-term use?

Suppose your family is enjoying dinner one night and your daughter’s cell phone rings. She excuses herself, goes in the other room for a few minutes, comes back out and announces: ‘‘Mom, Dad, I’m going over to Jason’s house tonight to do homework.’’

No doubt you reply, ‘‘Okay, bye. Have a nice time.’’

Ha, ha, good one, Al. Obviously, you don’t say that. You say: ‘‘Wait a second. Who’s Jason? What subject? Are his parents home?’’ Then you call over to the house to make sure that:

  1. adults answer the phone; and
  2. the adults who answer the phone do indeed have a son named Jason.

You are applying a “plausibility test” to your daughter’s statement so instinctively that you don’t even think, let alone, say: ‘‘Honey, I think we need to test the plausibility of this story.’’ That’s everyday life. Plausibility-testing would be defined as:

Using screamingly obvious parental techniques to check whether your kids are trying to get away with something.

The general definition of plausibility-testing in wellness

Not so in wellness, where employers never test plausibility.  (It’s amazing employer families don’t have a higher teen pregnancy rate.)  In wellness, plausibility-testing is defined as:

Using screamingly obvious fifth-grade arithmetic to check whether the vendor is trying to get away with something.

You might say: “Hey, I majored in biostatistics and I don’t remember learning about plausibility-testing or seeing that definition.” Well, that’s because until population health came along, plausibility testing didn’t exist because there was no need for it in real grownup-type biostatistics.  In real biostatistics studies, critics could “challenge the data.” They could show how the experiment was designed badly, was contaminated, had confounders, had investigator bias, etc. and therefore the conclusion should be thrown out.

The best example might be The Big Fat Surprise, by Nina Teicholz, in which she systematically eviscerates virtually every major study implicating saturated fat as a major cause of heart attacks, and raises the spectre of sugar as the main culprit. This was two years before it was discovered that the Harvard School of Public Health had indeed been paid off by the sugar lobby to do exactly what she had inferred they were doing.

What makes wellness uniquely suited to plausibility-testing is because, unlike Nina, you aren’t objecting to the data or methods, as in the case of every other debate about research findings. Rather, in wellness plausibility-testing, you typically accept the raw data or methods — but then observe they prove exactly the opposite of what the wellness promoter intended.  You do this even though the raw data and methods are usually suspect as well. For instance, dropouts are not only uncounted, and unaccounted for, in almost all wellness data. Indeed with the exception of Iver Juster poking the HERO bear in its own den, their existence is generally not even acknowledged. As an Argentinian would say, they’ve been disappeared.

Flunking plausibility is part of wellness industry DNA, the hilarity of which has been covered at length on this site, as recently as last week with (you guessed it) Ron Goetzel. I did have to give him some credit this time, though: usually a plausibility test requires 5 minutes to demonstrate he proved the opposite of what he intended to prove. This time it took 10.

And of course the best example was Wellsteps, where all you had to do was add up their own numbers to figure out they harmed Boise’s employees. You didn’t have to “challenge the data,” by saying they omitted non-participants and dropouts, that many people would likely have cheated on the weigh-ins etc. All those would be true, but they wouldn’t face-invalidate the conclusion the way that plausibility test did.

The specific definition of plausibility-testing using wellness-sensitive medical admissions

All of what you are about to read below, plus the story about Jennifer (which ends happily — it turned out Jason was home, they did do homework…and later on they got married and had kids of their own, whose plausibility they routinely check), is covered in Chapter 2 in Why Nobody Believes the Numbers. This adds the part about the ICD10s.

There is also a very specific plausibility test, in which you contrast reductions in wellness-sensitive medical event diagnosis codes with vendor savings claims, to see if they bear any relationship to each other.  The idea, as foreign as it may seem to wellness vendors, is that if you are running a program designed to reduce wellness-sensitive hospitalizations and ER visits, you should actually reduce wellness-sensitive hospitalizations and ER visits. Hence that is what you measure. Oh, I know it sounds crazy but it just might work.

And it’s not just us. The Validation Institute requires this specific analysis for member-facing organizations. They were adopted for a major Health Affairs case study on wellness (that didn’t get any attention because it showed wellness loses money even when a population is head-scratching unhealthy to begin with). And even the Health Enhancement Research Organization supported this methodology, before they realized the measuring validly was only a good strategy if you wanted to show losses.

Quizzify plausibility-tests its results in this manner and guarantees improvements, but because Quizzify reduces many more codes than just wellness-sensitive ones, the list of diagnosis codes below would be much-expanded. But the concept is the same.

The remainder of this post and (barring a “news” event in the interim) the next posting will show how to do a plausibility test. Today we’ll start with which codes to look at. Part 2 will be how to avoid common mistakes. Then we’ll cover how to compare your results to benchmarks.  Finally, we’ll show how to estimate the “savings” and ROI.

Codes to be used in a plausibility test

Start by identifying codes that are somewhat closely associated with lifestyle-related conditions and/or can be addressed through disease management.  These are the ones where, in theory at least, savings can be found.  Here are some sample ICD9s and ICD10s. In order to save space since this source data doesn’t reproduce well in WordPress, I can’t put the codes next to the conditions. Instead, I’ll stack ’em in the following order:

  1. asthma
  2. CAD
  3. CHF and other lifestyle cardio-related events
  4. COPD
  5. diabetes

ICD9s are stacked in the same order:

493.xx (excluding 493.2x*)
491.xx, 492.xx, 493.2x, 494.xx, 496.xx, 506.4x
410, 411, 413, 414 (all .xx)
249, 250, 251.1x, 252.2x, 357.2x, 362, 366.41, 681.1x, 682.6, 682.7, 785.4x , 707,  731.8x
398.90. 398.91, 398.99, 402.01, 402.11, 402.91,  404.01,  404.03, 404.11, 404.13, 404.91, 404.93, 422.0, 422.9x, 425.xx, 428.xx, 429.xx

ICD10s, ditto in order, are:

J40, J41, J42, J43, J44, J47, J68.4
i20, i21, i22, i23, i24, i25.1, i25.5, i25.6, i25.7
E08, E10, E11.0-E11.9, e16.1, e16.2, e08.42, e09.42, e10.42, e11.42, e13.42, e08.36, e09.36, e10.36, e11.311, e11.319, e11.329, e11.339, e11.349, e11.359, e11.36, e13.36, L03.119, L03.129, i96, E09.621, E09.622, E11.621, E11.622, E13.621, E13.622, L97
i50, i10, i11, i12, i13

The ICD9s and ICD10s are not a perfect match for each other. If ICD10s matched ICD9s, there would be no need for ICD10s. If you try to construct an events trendline crossing October 1 2015, when the ICD10s were adopted, you might find a bump. More on that another time.

Coming up next: So now that you have these ICD9s, what do you do with them?

In the immortal words of the great philosopher Pat Benatar, hit me with your best shot.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: