Study: Most Doctors Flunk Math Of Medical Test Accuracy

April 30, 2014

Carey Goldberg

File under: "Does not inspire confidence."

You've just been screened for a rare disease, one that only strikes 1 in 1,000 people. You test positive, your doctor tells you. Your heart drops into your stomach. "Is there any chance the test could be wrong?" you ask, your voice tinged with pleading.

"There's a very small chance," your doctor replies. "The test has a false positive rate of 5 percent. But that means there's a 95 percent chance that you do have the disease."

Bzzzzzzz. That's the jarring sound of the game show buzzer that means "Wrong. Wrong. Wrong."

A new study in the journal JAMA Internal Medicine posed just such a scenario to "24 attending physicians, 26 house officers, 10 medical students, and 1 retired physician" at a Boston‐area hospital. The hospital is not named, but all were affiliated with not-shabby medical schools: Harvard and Boston University.

And the vast majority blew the question. (Which was: "If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming you know nothing about the person's symptoms or signs?”)

Close to half gave the answer "95 percent." Not even close. The correct answer is 2 percent. (For an explanation of the math, check out Tom Siegfried's excellent Science News post, which uses the analogy of a baseball player who flunks a drug test.)

The study — "Medicine's Uncomfortable Relationship With Math" — replicated a similar math check done in 1978, and found little progress: 23 percent got the answer right in 2013, compared to 18 percent of a similar group in the 1978 study.

Yikes. What are patients — and doctors — to do about such medical innumeracy? I contacted the Boston-based Informed Medical Decisions Foundation, and spoke to research director Carrie Levin.

It's always helpful to get a vivid explanation of what test results mean, she said. For example, if a risk is 1 in 10,000, "that means it's one person sitting in a basketball stadium. It helps to make it realistic for people, to ground it in something people can understand." But the JAMA Internal Medicine paper is not about poor explanation, she noted; it's about doctors' own miscomprehension.

So what to do? You get a second opinion, she said; possibly an additional test or a repeat test. And most of all, "Don't assume your doctor knows everything. That's the biggest assumption. You have to hold him or her accountable but you also have to take responsibility for knowing what's going on."

Sign all of us up for Bio-Stats For Dummies, in other words. And by now, given the increasingly loud debate over the downsides of screening tests like mammograms, we can perhaps all begin with the don't-panic sense that false positives abound — in screening tests, at least.

And what should the medical system do about its own innumeracy? This from one of the paper's authors, Dr. Isaac Kohane of Harvard Medical School:

What should doctors do? The standard answer, that they should take an additional course on statistics, is an obvious nonstarter. It is a well-known phenomenon that medical students are required to master advanced calculus for admission to medical school, but once admitted you'll be hard-pressed to find medical students with any facility in calculus. Similarly, the course in statistics will be taken, the students will do well, and they will not use it in practice.
What is really required is a quantitative approach to medicine that permeates the entire curriculum. This is extremely counter-cultural and yet it is as urgent as the Flexnerian revolution of 1910, which demanded that medical schools teach medical science, including the then-cutting-edge science of infectious disease control.
In the absence of such curricular reform — which would be radical because many of the faculty have similar limitations — external sources of expertise, such as companies providing automated decision support to physicians, will flourish. But the locus of intellectual control may flow further away from doctors as a result.

And patients? Dr. Kohane expects the development of Web-based services that will help patients interpret their own test results.

"Just as CVS and Walmart are encroaching on the commoditization of basic health-care services," he says, "we will see medical knowledge companies commoditizing tasks that we previously thought to be the intellectual center of medicine. And the reason this is possible is because of the fundamental innumeracy of doctors in the face of a huge data and testing challenge."

Side-note: Journalists are by no means innocent of innumeracy. In his Science News post, Tom Siegfried notes a recent case of journalistic misinterpretation about testing for Alzheimer's disease along lines similar to the Boston study's findings:

In March, the journal Nature Medicine published a report about a test of blood lipids. It predicted the imminent arrival (within two to three years) of Alzheimer’s (or mild cognitive impairment) with over 90 percent accuracy. News reports heralded the 90 percent accuracy of the test as though it were big deal. But more astute commentary pointed out that such a 90 percent accurate test would in fact be wrong 92 percent of the time.
That’s based on an Alzheimer’s prevalence in the population of 1 percent. If you test only people over age 60, the prevalence rate goes up to 5 percent. In that case a positive result with a 90 percent accurate-test is correct 32 percent of the time. “So two-thirds of positive tests are still wrong,” pharmacologist David Colquhoun of University College London writes in a blog post, where he works out the math in detail for use in evaluating such screening tests.
Neither the scientific paper nor media reports pointed out the fallacy in the 90 percent accuracy claim, Colquhoun noted. There seems to be “a conspiracy of silence about the deficiencies of screening tests,” he comments. He suggests that researchers seeking funding are motivated to hype their results and omit mention of how bad their tests are, and that journals seeking headlines don’t want to “pour cold water” on a good story. “Is it that people are incapable of doing the calculations? Surely not,” he concludes.

Readers, have you found any resources that were particularly helpful for interpreting your test results?