Man, good effort but then you mess it up

Hai hai (as my Urdu-speaking partner would say), Derek Lowe starts well but messes up his conclusion (via Megan McArdle):

The news of a possible diagnostic test for Alzheimer’s disease is very interesting [...]

But let’s run some numbers. The test was 91% accurate when run on stored blood samples of people who were later checked for development of Alzheimer’s, which compared to the existing techniques is pretty good. Is it good enough for a diagnostic test, though? We’ll concentrate on the younger elderly, who would be most in the market for this test.The NIH estimates that about 5% of people from 65 to 74 have AD. According to the Census Bureau (pdf), we had 17.3 million people between those ages in 2000, and that’s expected to grow to almost 38 million in 2030. Let’s call it 20 million as a nice round number.

What if all 20 million had been tested with this new method? We’ll break that down into the two groups – the 1 million who are really going to get the disease and the 19 million who aren’t. When that latter group gets their results back, 17,290,000 people are going to be told, correctly, that they don’t seem to be on track to get Alzheimer’s. Unfortunately, because of that 91% accuracy rate, 1,710,000 people are going to be told, incorrectly, that they are. You can guess what this will do for their peace of mind. Note, also, that almost twice as many people have just been wrongly told that they’re getting Alzheimer’s than the total number of people who really will.

Meanwhile, the million people who really are in trouble are opening their envelopes, and 910,000 of them are getting the bad news. But 90,000 of them are being told, incorrectly, that they’re in good shape, and are in for a cruel time of it in the coming years.

The people who got the hard news are likely to want to know if that’s real or not, and many of them will take the test again just to be sure. But that’s not going to help; in fact, it’ll confuse things even more. If that whole cohort of 1.7 million people who were wrongly diagnosed as being at risk get re-tested, about 1.556 million of them will get a clean test this time. Now they have a dilemma – they’ve got one up and one down, and which one do you believe? Meanwhile, nearly 154,000 of them will get a second wrong diagnosis, and will be more sure than ever that they’re on the list for Alzheimer’s.

Meanwhile, if that list of 910,000 people who were correctly diagnosed as being at risk get re-tested, 828 thousand of them will hear the bad news again and will (correctly) assume that they’re in trouble. But we’ve just added to the mixed-diagnosis crowd, because almost 82,000 people will be incorrectly given a clean result and won’t know what to believe.

I’ll assume that the people who got the clean test the first time will not be motivated to check again. So after two rounds of testing, we have 17.3 million people who’ve been correctly given a clean ticket, and 828,000 who’ve been correctly been given the red flag. But we also have 154,000 people who aren’t going to get the disease but have been told twice that they will, 90,000 people who are going to get it but have been told that they aren’t, and over 1.6 million people who have been through a blender and don’t know anything more than when they started.

Sad but true: 91% is just not good enough for a diagnostic test.

Yes, doctors need to be able to calculate the probability a patient has a given disease taking into account not only the accuracy of the test but also other available information (e.g., for random testing, prevalence of the disease amongst an age-group); and they need to communicate this information clearly to the patient. This misunderstanding is a real problem, and something that doctors and everyone else need to be educated about.

But to go from that to '91% is just not good enough' is a huge leap.

As long as there isn't a 100% accurate test, we can never be certain whether the disease is present or not; but the test does give a lot of relevant information and we can lower the probability of a false alarm as much as we like by administering the test again and again.

If a disease affects 1 in 20 people and the test is 90% accurate, a 'positive' result means you have a mere 32% probability you are actually ill. If you administer the test a second time and you get a second positive, this probability jumps to 81%, and this keeps rising with the number of positive results. For a negative test result, the news are even better: the first negative result translates to a 99.5% you are healthy, the second negative to a .999% that you are.

(18% of the people will get one positive and one negative, which simply means there is a 95% probability they are healthy - i.e. the same as before taking any tests. Instead of 'not knowing what to believe', as Lowe speculates, their doctors should just explain to them that they need more testing if they want to increase the accuracy of the standard, pre-test prediction (healthy) above 95%)

Pay attention now, here comes the correct conclusion: If you don't have any symptoms, a positive test result for most diseases doesn't mean much - in most cases, you are still more likely to be healthy than not.

Next time you take a test, ask your doctor to calculate the probability you are actually ill or healthy; and if you want more certainty, take the test again, and again, until you are content with the degree of certainty on offer. And thank all those nice researchers for them 90% accurate tests - at least if they are not painful.


  1. Surreptitious Evil Says:

    And it confuses (or fails to address) the potential difference between rates of false positive and false negative results. Just because the total error rate was 9%, doesn't mean that the individual rates were also 9%.

    Generally, modern tests can be biased either way: because of the horror of the false positive or because of the utility in actual predictive diagnosis of alternative tests and symptom-led diagnosis.

    Hence, as an example, (relatively) cheap mass cancer screening, leading to (more expensive) biopsies leading to preventative intervention. Pun unintentional.