Home » Measuring ROI
Category Archives: Measuring ROI
Alice laughed: “There’s no use trying,” she said. “One can’t believe impossible things.”
“I daresay you haven’t had much practice,” said the Queen. “When I was younger, I always did it for half an hour a day. Why, sometimes I’ve believed as many as six impossible things before breakfast.”
Six impossible things before breakfast? The wellness industry would just be getting warmed up by believing six impossible things before breakfast. They believe enough impossible things all day long to support an entire restaurant chain:
Consider the article in the current issue of BenefitsPro — forwarded to me by many members of the Welligentsia — entitled: “Can the Wellness Industry Live Up to Its Promises?” BenefitsPro rounded up some of the leaders of the wellness industry alt-stupid segment. Specifically, they interviewed US Corporate Wellness, Fitbit, Staywell, and HERO. Each is a perennial candidate for the Deplorables Awards — except US Corporate Wellness, which already secured its place in the Deplorables Hall of Fame (and Why Nobody Believes the Numbers) several years ago with these three paeans to the gods of impossibility.
In case you can’t read the key statistic — the first bullet point — it says: “Wellness program participants are 230% less likely to utilize EIB (extended illness benefit) than non-participants.” Here is some news for the Einsteins at US Corporate Wellness: You can’t be 230% less likely to do anything than anybody. For instance, even you, despite your best efforts in these three examples, can’t be 230% less likely to have a triple-digit IQ than the rest of us. Here’s a rule of math for you: a number can only be reduced by 100%. Rules of math tend to be strictly enforced, even in wellness. So the good news is, even in the worst-case scenario, you’re only 100% less likely to have a triple-digit IQ than the rest of us.
And yet, if it were possible to be 230% dumber than the rest of us, you might be. For instance, US Corporate Wellness also brought us this estimate of the massive annual savings that can be obtained just by, Seinfeld-style, doing nothing:
So assume I spent about $3500/year in healthcare 12 years ago, which is probably accurate. My modifiable risk factors were zero then and they are still zero — no increase. So my healthcare spending should have fallen by $350/year for 12 years, or $4200 since then. But that would be impossible, since I could only reduce my spending by $3500. Do you see how that works now?
To his credit, US Corporate Wellness’s CEO, Brad Cooper, is quoted in this article as saying: “Unfortunately some in the industry have exaggerated the savings numbers.” You think?
I’m pretty sure this next one is impossible too. I say “pretty sure” because I’ve never been able to quite decipher it, English being right up there with math as two subjects which apparently frustrated many a wellness vendor’s fifth grade teacher:
400% of what? Is US Corporate Wellness saying that, as compared to employees with a chronic disease like hypertension, employees who take their blood pressure pills are 400% more productive? Meaning that if they controlled their blood pressure, waiters could serve 400% more tables, doctors could see 400% more patients, pilots could fly planes 400% faster? Teachers could teach 400% more kids? Customer service recordings could tell us our calls are 400% more important to them?
Or maybe wellness vendors could make 400% more impossible claims. That would explain this BenefitsPro article.
We have been completely unable to get Fitbit to speak, but BenefitsPro couldn’t get them to shut up. Here is Fitbit’s Amy McDonough: “Measurement of a wellness program is an important part of the planning process.” Indeed it is! It’s vitally important to plan on how to fabricate impossible outcomes to measure, when in reality your product may even lead to weight gain. Here is one thing we know is impossible: you can’t achieve a 58% reduction in healthcare expenses through behavior change — especially if (as in the 133 patients they tracked in one of their studies) behavior didn’t actually change.
You can read about that gem, and others, in our recent Fitbit series here:
- Springbuk wants employees to go to the bathroom
- Fitbit throws a bit of a fit, Part 1
- Fitbit throws a bit of a fit, Part 2
Health Enhancement Research Organization (HERO) and Staywell
I’ll consider these two outfits together because people seem to bounce back and forth between them. Jessica Grossmeier is one such person. Jessica became the Neil Armstrong of impossible wellness outcomes way back in 2013. Not just any old impossible wellness outcomes — those have been around for decades. She and Staywell pioneered the concept of claiming outcomes they already knew were impossible. While at Staywell, she and her co-conspirators told British Petroleum they had saved about $17,000 per risk factor reduced. So, yes, according to Staywell, anyone who temporarily lost a little weight saved BP $17,000 — enough to clean up about 1000 gallons of oil spilled from Deepwater Horizon.
See British Petroleum’s Wellness Program Is Spewing Invalidity for the details.
Leave aside both the obvious impossibility of this claim, and also the mathematical impossibility of this claim given that employers only actually spend about $6000/person on healthcare. Jessica’s breakthrough was to also ignore the fact that this $17,000/risk factor savings figure exceeds by 100 times what her very own article claims in savings. Not by 100 percent. By 100 times.
Fast-forward to her new role at HERO. In this article she says:
The conversation has thus shifted from a focus on ROI alone to a broader value proposition that includes both the tangible and intangible benefits of improved worker health and well-being.
Her memory may have failed her here too because HERO — in addition to admitting that wellness loses money (which explains its “shift” from the “focus on ROI alone”) — also listed the “broader value proposition” elements of their pry-poke-and-prod wellness programs. The problem is the elements of the broader value proposition of screening the stuffing out of employees aren’t “benefits.” They’re costs, and lots of them:
When she says: “The conversation has shifted from a focus on ROI alone,” she means: “We all got caught making up ROIs so we need to make up a new metric.” RAND’s Soeren Mattke predicted this new spin three years ago, observing that every time the wellness industry makes claims and they get debunked, they simply make a new set of claims, and then they get debunked, and then the whole process repeats with new claims, whack-a-mole fashion, ad infinitum. Here is his specific quote:
“The industry went in with promises of 3 to 1 and 6 to 1 based on health care savings alone – then research came out that said that’s not true. Then they said: “OK, we are cost neutral.” Now, research says maybe not even cost neutral. So now they say: “But is really about productivity, which we can’t really measure but it’s an enormous return.”
While other vendors, such as Wellsteps, harm plenty of employees, Interactive Health holds the distinction of being the only wellness vendor to actually harm me. I went to a screening of theirs. In order to increase my productivity, they stretched out my calves. Indeed, I could feel my productivity soaring — until one of them went into spasm. I doubt anyone has missed this story but in case anyone has…
They also hold the distinction of being the first vendor (actually their consultant) to try to bribe me to stop pointing out how impossible their outcomes were. They were upset because I profiled them n the Wall Street Journal . The article is behind a paywall, so you probably can’t see it. Here’s the spoiler: they allegedly saved a whopping $53,000 for every risk factor reduced. In your face, Staywell!
Here is the BenefitsPro article’s quote from Interactive Health’s Jared Smith:
“There are many wellness vendors out there that claim to show ROI,” he says. “However, many of their models and methodologies are complex, based upon assumptions that do not provide sufficient quantitative evidence to substantiate their claims.”
Finally, here is a news flash for Interactive Health: sitting is not the new smoking. If anything is the “new smoking,” it’s opioid addiction, which has reached epidemic proportions in the workforce while being totally, utterly, completely, negligently, mind-blowingly, Sergeant Shultz-ily, ignored by Interactive Health and the rest of the wellness industry.
There is nothing funny about opioid addiction and the wellness industry’s failure to address it, a topic for a future blog post. The only impossibility is that it is impossible to believe that an entire industry charged with what Jessica Grossmeier calls “worker health and well-being” could have allowed this to happen. Alas, happen it did.
And, as you can see from the time-stamp on this post, except at establishments favored by the Wellness Ignorati, breakfast hasn’t even been served yet.
In wellness, it is perfectly legal to lie to customers and prospects. That’s in most vendors’ DNA.* (Not all vendors — we will soon be publishing an expanded list of honest ones, and for now would direct readers to http://www.ethicalwellness.org for the original list of honest ones.)
However, if you are a public company, it is quite illegal to lie to shareholders. It’s possible Fitbit did just that. If they did, they could face major SEC sanctions.
Did that just happen? Read this link and then you make the call.
PS This is the sequel to Springbuk Wants Employees to Go to the Bathroom, which should be read in conjunction with this link.
*The irony is that one of the biggest liars specializes in collecting employee DNA and then pretending that they can save a ton of money by getting employees to lose weight by telling them it’s pretty darn impossible to lose weight, because they have a gene for obesity. Yes, you read that right and, no, it doesn’t make any sense.
Update: The link was removed at Fitbit’s request. In a couple of weeks they will defend this report and explain why they or Springbuk never responded to the requests I made for more information before publishing it. Good luck with that.
This is Part 2 in the 4-part series on plausibility-testing and measuring ROI of wellness-sensitive medical events. No vendors or consultants are being “outed” in this posting, so if you read TSW for the shock value, you’ll be disappointed. But of course you don’t do that — you read TSW to gain insight and knowledge. Yeah, right, and you used to subscribe to Playboy for the articles.
In the previous installment, which should be reviewed prior to reading this one, we listed the ICD9s and ICD10s used to identify and measure wellness-sensitive medical events. You want to count the number of ER visits and inpatient stays across these diagnoses, the idea being that this total should be low and/or declining, if indeed wellness (and disease management) are accomplishing anything.
This total is never going to fall to zero — people will always be slipping through the care and especially self-care cracks — but the best performing health plans and employers can manage the total down to 10-15 visits and stays per year. To put this in perspective, incurring only 10 ER and IP claims a year per 1000 covered people for wellness-related events is a great accomplishment, given that you have about 250 ER and IP claims/1000 covered people for all-causes combined. That would mean only about 5% of your claims are wellness-sensitive. If hospital and ER spending is about 40% of your total spending, that would mean your spending on events theoretically avoidable by wellness programs represents about 2% of your total spending. (So much for the CDC’s rant that 86% of your claims are associated with chronic disease. This from the people who are head-scratchingly alarmed by the “arresting fact” that “chronic disease is responsible for 7 out of every 10 deaths.” And yet these guys somehow wiped out polio…)
When you count these codes, there are a number of mistakes you could make, but shouldn’t, if you follow this checklist. It’s really very easy, meaning that many mistakes are the result of overthinking the analysis.
Think of it this way: if you were estimating a birth rate, you wouldn’t look at the participants in your prenatal program, or count how many women made appointments with obstetricians. You’d simply tally the number of babies born and divide that figure by the number of people you cover. Each potential mistake on this list is avoidable by keeping that example in mind.
I’ve got a little list
- Do not “count” the number of people (two discharges for one person equals one discharge for two people), and do not take into account whether people were in a disease management or wellness program.
- Do not count people for whom you are secondary payer.
- If someone has an event straddling the year-end, count them in the year of discharge
- Don’t be concerned with taking out false positives; they will “wash”
- If someone is transferred and has an applicable primary diagnosis both times, they count twice. (This should happen automatically.)
- If someone has (for example) a heart attack and an angina attack in one hospitalization, only the primary code counts
- Admissions following discharges count separately if they generate two different claims forms
- Interim submissions of claims or claims submissions replaced by other claims submissions should only be counted once (since they represent only one hospital stay)
- Admissions made through the ER, of course, do not count as ER visits
- Claims may include facility and professional. Remember to only count facility and not professional claims – otherwise it is double-counting
- Urgent care is not the same as ER. ER includes just (1) ER PLACE OF SERVICE and (2) OBSERVATION DAYS.
- All ACUTE CARE hospital admissions count, including <24 hours, and EXCLUDING observation days, which we count with ER.
- Allowed claims, not paid claims
- Fiscal year or Calendar year is fine — most people use fiscal year
- Be careful that your case-finding algorithm notes that sometimes IP admissions from the ER take place the day after the ER admission (like at night)!
- Go back as many years as is conveniently trackable. The more years you go back, the more insight you will glean from the analysis.
- For ER discharges, include all submissions whether non-emergent or emergent
- Do NOT count members >65 in the “commercial” category even if you are primary-pay. (That would mess up your comparisons.
Coming up next time: how to present and interpret your results.
Suppose your family is enjoying dinner one night and your daughter’s cell phone rings. She excuses herself, goes in the other room for a few minutes, comes back out and announces: ‘‘Mom, Dad, I’m going over to Jason’s house tonight to do homework.’’
No doubt you reply, ‘‘Okay, bye. Have a nice time.’’
Ha, ha, good one, Al. Obviously, you don’t say that. You say: ‘‘Wait a second. Who’s Jason? What subject? Are his parents home?’’ Then you call over to the house to make sure that:
- adults answer the phone; and
- the adults who answer the phone do indeed have a son named Jason.
You are applying a “plausibility test” to your daughter’s statement so instinctively that you don’t even think, let alone, say: ‘‘Honey, I think we need to test the plausibility of this story.’’ That’s everyday life. Plausibility-testing would be defined as:
Using screamingly obvious parental techniques to check whether your kids are trying to get away with something.
The general definition of plausibility-testing in wellness
Not so in wellness, where employers never test plausibility. (It’s amazing employer families don’t have a higher teen pregnancy rate.) In wellness, plausibility-testing is defined as:
Using screamingly obvious fifth-grade arithmetic to check whether the vendor is trying to get away with something.
You might say: “Hey, I majored in biostatistics and I don’t remember learning about plausibility-testing or seeing that definition.” Well, that’s because until population health came along, plausibility testing didn’t exist because there was no need for it in real grownup-type biostatistics. In real biostatistics studies, critics could “challenge the data.” They could show how the experiment was designed badly, was contaminated, had confounders, had investigator bias, etc. and therefore the conclusion should be thrown out.
The best example might be The Big Fat Surprise, by Nina Teicholz, in which she systematically eviscerates virtually every major study implicating saturated fat as a major cause of heart attacks, and raises the spectre of sugar as the main culprit. This was two years before it was discovered that the Harvard School of Public Health had indeed been paid off by the sugar lobby to do exactly what she had inferred they were doing.
What makes wellness uniquely suited to plausibility-testing is because, unlike Nina, you aren’t objecting to the data or methods, as in the case of every other debate about research findings. Rather, in wellness plausibility-testing, you typically accept the raw data or methods — but then observe they prove exactly the opposite of what the wellness promoter intended. You do this even though the raw data and methods are usually suspect as well. For instance, dropouts are not only uncounted, and unaccounted for, in almost all wellness data. Indeed with the exception of Iver Juster poking the HERO bear in its own den, their existence is generally not even acknowledged. As an Argentian would say, they’ve been disappeared.
Flunking plausibility is part of wellness industry DNA, the hilarity of which has been covered at length on this site, as recently as last week with (you guessed it) Ron Goetzel. I did have to give him some credit this time, though: usually a plausibility test requires 5 minutes to demonstrate he proved the opposite of what he intended to prove. This time it took 10.
And of course the best example was Wellsteps, where all you had to do was add up their own numbers to figure out they harmed Boise’s employees. You didn’t have to “challenge the data,” by saying they omitted non-participants and dropouts, that many people would likely have cheated on the weigh-ins etc. All those would be true, but they wouldn’t face-invalidate the conclusion the way that plausibility test did.
The specific definition of plausibility-testing using wellness-sensitive medical admissions
All of what you are about to read below, plus the story about Jennifer (which ends happily — it turned out Jason was home, they did do homework…and later on they got married and had kids of their own, whose plausibility they routinely check), is covered in Chapter 2 in Why Nobody Believes the Numbers. This adds the part about the ICD10s.
There is also a very specific plausibility test, in which you contrast reductions in wellness-sensitive medical event diagnosis codes with vendor savings claims, to see if they bear any relationship to each other. The idea, as foreign as it may seem to wellness vendors, is that if you are running a program designed to reduce wellness-sensitive hospitalizations and ER visits, you should actually reduce wellness-sensitive hospitalizations and ER visits. Hence that is what you measure. Oh, I know it sounds crazy but it just might work.
And it’s not just us. The Validation Institute requires this specific analysis for member-facing organizations. They were adopted for a major Health Affairs case study on wellness (that didn’t get any attention because it showed wellness loses money even when a population is head-scratching unhealthy to begin with). And even the Health Enhancement Research Organization supported this methodology, before they realized the measuring validly was only a good strategy if you wanted to show losses.
Quizzify plausibility-tests its results in this manner and guarantees improvements, but because Quizzify reduces many more codes than just wellness-sensitive ones, the list of diagnosis codes below would be much-expanded. But the concept is the same.
The remainder of this post and (barring a “news” event in the interim) the next posting will show how to do a plausibility test. Today we’ll start with which codes to look at. Part 2 will be how to avoid common mistakes. Then we’ll cover how to compare your results to benchmarks. Finally, we’ll show how to estimate the “savings” and ROI.
Codes to be used in a plausibility test
Start by identifying codes that are somewhat closely associated with lifestyle-related conditions and/or can be addressed through disease management. These are the ones where, in theory at least, savings can be found. Here are some sample ICD9s and ICD10s. In order to save space since this source data doesn’t reproduce well in WordPress, I can’t put the codes next to the conditions. Instead, I’ll stack ’em in the following order:
- CHF and other lifestyle cardio-related events
ICD9s are stacked in the same order:
|493.xx (excluding 493.2x*)|
|491.xx, 492.xx, 493.2x, 494.xx, 496.xx, 506.4x|
|410, 411, 413, 414 (all .xx)|
|249, 250, 251.1x, 252.2x, 357.2x, 362, 366.41, 681.1x, 682.6, 682.7, 785.4x , 707, 731.8x|
|398.90. 398.91, 398.99, 402.01, 402.11, 402.91, 404.01, 404.03, 404.11, 404.13, 404.91, 404.93, 422.0, 422.9x, 425.xx, 428.xx, 429.xx|
ICD10s, ditto in order, are:
|J40, J41, J42, J43, J44, J47, J68.4|
|i20, i21, i22, i23, i24, i25.1, i25.5, i25.6, i25.7|
|E08, E10, E11.0-E11.9, e16.1, e16.2, e08.42, e09.42, e10.42, e11.42, e13.42, e08.36, e09.36, e10.36, e11.311, e11.319, e11.329, e11.339, e11.349, e11.359, e11.36, e13.36, L03.119, L03.129, i96, E09.621, E09.622, E11.621, E11.622, E13.621, E13.622, L97|
|i50, i10, i11, i12, i13|
The ICD9s and ICD10s are not a perfect match for each other. If ICD10s matched ICD9s, there would be no need for ICD10s. If you try to construct an events trendline crossing October 1 2015, when the ICD10s were adopted, you might find a bump. More on that another time.
Coming up next: So now that you have these ICD9s, what do you do with them?