jump to navigation

Life-giving statistics April 16, 2011

Posted by Ezra Resnick in Math, Science.
Tags:
add a comment

The goal of science (and, presumably, of all rational beings) is to understand the world we live in. Unfortunately, our world is fraught with uncertainty, and our lives are often ruled by chance. We have therefore developed tools for quantifying uncertainty — allowing us to take it into account in our calculations. Statistics are all around us, and we try to use them when making our life decisions. It is well known, however, that human intuitions about probability are very bad — which is why it’s so important for everyone to be mathematically literate, including an understanding of basic probability theory.

One simple yet extremely useful tool that everyone should be familiar with is Bayes’ theorem:

Bayes’ theorem shows how to calculate the conditional probability P(A|B) — the probability that A will occur given that B has occurred — based on the inverse probability P(B|A), along with the unconditional (“prior”) probabilities of A and B. (The theorem is easily derived from the following identity: the probability that both A and B will occur is equal to P(A) multiplied by P(B|A), and also to P(B) multiplied by P(A|B).)

For example, consider a genetic disease that is known to afflict one person out of 100,000. It is possible to test for the genetic marker associated with the disease, but of course the test is not perfect: let’s assume that the false-positive rate is five percent (i.e., 5% of healthy people test positive) and the false-negative rate is one percent (i.e., 1% of diseased people test negative). Still, this seems like a pretty reliable test: it gives the correct result for 95% of healthy people and 99% of diseased people. So, if you took this test and the result was positive, you would probably be seriously worried, thinking that you most likely have the dreaded disease. This is where Bayes’ theorem can be a lifesaver.

Let’s use the theorem to calculate the probability of a person having the disease (A) given that his test result was positive (B). We assumed that P(A) = 1/100,000 (the prior probability of having the disease), and that P(B|A) = 99/100 (the probability of testing positive given that you have the disease). To compute P(B) — the prior probability of testing positive — we need to factor in both true positives and false positives; but since the prior probability of having the disease is so low, the combined value is only marginally higher than the false-positive probability of 5/100.

Putting it all together, we find that P(A|B) — the probability of a person having the disease given that his test result was positive — equals 99/100 multiplied by 1/100,000, divided by (slightly more than) 5/100. The result? Less than 1 in 5,000! In other words, even if your test result was positive, your chances of having the disease are still only about 0.02%. Your risk is twenty times greater than the general population, but it’s still highly likely that you are disease-free. This counter-intuitive result is due to the fact that the disease is quite rare to begin with, and consequently the vast majority of positive test results will be false positives.

In his essay “The Median Isn’t the Message,” Stephen Jay Gould describes the crucial role that scientific and mathematical knowledge came to play in his own personal life, while criticizing people’s “common distrust or contempt for statistics”:

Many people make an unfortunate and invalid separation between heart and mind, or feeling and intellect. In some contemporary traditions, abetted by attitudes stereotypically centered on Southern California, feelings are exalted as more “real” and the only proper basis for action — if it feels good, do it — while intellect gets short shrift as a hang-up of outmoded elitism. Statistics, in this absurd dichotomy, often become the symbol of the enemy. As Hilaire Belloc wrote, “Statistics are the triumph of the quantitative method, and the quantitative method is the victory of sterility and death.”

This is a personal story of statistics, properly interpreted, as profoundly nurturant and life-giving. It declares holy war on the downgrading of intellect by telling a small story about the utility of dry, academic knowledge about science.

After being diagnosed with mesothelioma, a rare and incurable cancer, Gould learned that his condition had a median mortality of only eight months — meaning that half of all people in his situation were dead within that time period. He was stunned for some minutes; but then “my mind started to work again, thank goodness.” As an evolutionary biologist, Gould knew that “variation itself is nature’s only irreducible essence,” while “means and medians are the abstractions.” What he had to do was place himself amidst the variation: figure out whether he was likely to belong to the fifty percent above the median. After a “furious and nervous” hour of research, he concluded that, based on his personal characteristics, his chances were in fact very good. Furthermore, the mortality distribution was “right skewed,” meaning that the lifespans of those who live longer than the eight month median stretch out over several years.

Attitude and mindset are known to matter when fighting disease. Gould credits his scientific training with giving him the knowledge he needed in order to correctly interpret the statistics, understand his situation, and avoid despair: “From years of experience with the small-scale evolution of Bahamian land snails treated quantitatively, I have developed this technical knowledge — and I am convinced that it played a major role in saving my life. Knowledge is indeed power…”

Stephen Jay Gould died in May, 2002 — twenty years after his mesothelioma diagnosis.