A theorem for all seasons

As life scientists, it is seldom that we ever get to deal with anything resembling certainty. The systems that we work with are typically nonlinear and chaotic, heterogeneous, non-binary - in a word, messy. In the world of commercial life science, it is common that the real value of an R&D investment of hundred of millions or even billions of dollars, may ultimately ride on such a razor-thin edge between success and failure that it requires the calculation of something like a t-test to determine whether you really have a marketable product or just another placebo - or in the case of a diagnostic, a real indicator versus background noise. The development of a new drug or diagnostic is in essence, a process of gathering evidence either for or against your working hypothesis that the use of your product will confer some net benefit over not using it. In such a case, the null hypothesis - that your product will confer no net benefit at all - is (and always should be) a core consideration in your approach.

With each new piece of data that we accumulate along our hopeful path to that blockbuster product, we are weighing the evidence for and against eventual success or failure. A big part of this process for a commercial life science company, is the decision based upon the current evidence, of whether or not it is worthwhile to continue the investment of time, money and resources  on the product, or to pull the plug on it. All too often and for all sorts of reasons, it can be painful and difficult for a company to admit that a product is a dead-end and walk away from its investment. Killing projects in a timely fashion is a particularly acute problem in the case of drug development, given the exponentially increasing cost of R&D as the product progresses through the subsequent phases of development.

This process of weighing the evidence was mathematically formalized during the 18th century, by Thomas Bayes, an English priest who was fascinated with statistics and probability. As a small but amusing aside, it is ironic to reflect that there is some uncertainty about whether the only extant portrait of Thomas Bayes shown above, actually depicts the right person! But whether or not the portrait is a true likeness of the great man, the now famous theorem to which he gave his name, stands as a landmark in the history of probability theory. An understanding of the implications of Bayes' Theorem and its application to the myriad problems of truth, belief and likelihood that our uncertain world challenges us with daily, is something that every scientist (biologist or not), can put to good use in his or her own work.

So why is Bayes' Theorem so useful and what does it have to teach us as life scientists?

By way of a very simple and brief introduction to Bayes' Theorem, let's take a look at the development of biomarkers - an area of life science research that is directly concerned with issues of prediction and likelihood.

Let's imagine that we are looking for a reliable early indicator for a disease which affects about 1.5% of the population and that our research has uncovered a biomarker whose presence is predictive of the disease in about 80% of sufferers. Sounds pretty good right? Most people would probably look at those numbers and conclude that a positive test for the biomarker was associated with something like an 80% chance of having the disease. Not too shabby.

Not so fast.

One very important question that remains unanswered is "How many people who do not have the disease, would still get a positive test result with this biomarker?"  A biomarker that produced a positive result (indicative of the presence of the disease) in 80% of all patients, with or without the disease, would obviously have no predictive power at all for signaling the presence of the disease. As with the inherent uncertainty that pervades most things in life, biomarkers are seldom if ever 100% reliable, but let's say for the purposes of our story, that the biomarker in question produces false positives in about 4% of patients without the disease  (i.e. the test result indicates disease where none is actually present). Things seem to be looking up. Armed with these numbers, we might feel that this biomarker has a bright future in the clinic based upon the following reasoning - it will only fail to detect the disease in about 2 out of 10 sufferers and it will only produce a misdiagnosis of the disease in about 4 out of 100 healthy patients.

So now we're in good shape right?

Once again - not so fast.

Let's think about this biomarker's performance from the perspective of a hypothetical population of 10,000 patients. Based upon the 1.5% incidence of this disease in the population, we would expect our population to have about 150 patients with the disease and therefore, about 9,850 without it. Of those 150 patients with the disease, we would expect about 120 to test positive for the biomarker based upon an 80% positive test rate amongst people with the disease. Amongst the 9,850 patients who do not have the disease, we would expect about 394 to test positive based upon a 4% false positive test rate for the biomarker.

Now put yourself in the position of one of those patients who just got a positive test result. The first question you're going to ask is "What is the probability that I have the disease given that I tested positive for it?"

This is really the key question. What does the test result actually mean?

To answer that question, let's look at the overall probability of getting a positive test result under any circumstances. We expect 120 patients with the disease to test positive and 394 without the disease to test positive. So out of a total of 514 positive tests, we expect 120 patients who test positive to actually actually have the disease, corresponding to a probability of about 23%. In other words, the answer to the question of the patient who had the positive test result is that they have only about 1 chance in 4 of actually having the disease, based upon the positive test result. Put another (and perhaps more optimistic) way, despite the positive test result, there are still about 3 chances in 4 that they do not have the disease. Or put in yet another way - despite the positive test result, the patient is still 3 times more likely not to have the disease than to have it.

In the light of this new analysis of the biomarker's performance, would you still conclude that this biomarker is a useful clinical diagnostic for this disease? If you were a physician for example, would you schedule a potentially risky or expensive surgical procedure based upon the 1 in 4 chance of the disease indicated by the positive test result? Would you alternatively, recommend doing nothing at all despite the positive test result?

You might be really surprised to learn that this "hypothetical" disease biomarker example is based upon the real numbers for the CA-125 biomarker that is actually used as a diagnostic indicator for ovarian cancer. A wealth of statistics have been published both for ovarian cancer incidence and for the use of CA-125 as a diagnostic marker. All that remained for me to do, was to plug these numbers into a Bayesian model.

According to the American Cancer Societythe lifetime risk of a woman developing ovarian canceris 1 in 72 (0.0134). In a recent study involving more than 78,000 women, the use of CA-125 as a single indicator, yielded 3,285 false positive results (~4%) in which healthy women were diagnosed as having ovarian cancer. Of these incorrectly diagnosed women, 1080 of them actually underwent an unnecessary surgical biopsy procedure, of whom about 150 suffered severe complications as a result. As if this bad news was not already enough, the diagnostic use of CA-125 for ovarian cancer was shown to be of little use even in women who have already had ovarian cancer. In a study that examined the benefit of using elevated CA-125 levels as an early marker for ovarian cancer relapse, there was shown to be no survival benefit for women who were started early on chemotherapy based upon the CA-125 test results, versus those who waited until they exhibited the clinical symptoms of relapse.

It is worth noting by the way, that the diagnostic probabilities obtained from this admittedly rather crude Bayesian model presented above, do nonetheless correlate rather well with the actual statistics obtained for true and false positive tests from studies of women who were tested with CA-125 for ovarian cancer.

It should be clear from this example, that it is important to weigh the evidence for the efficacy of diagnostic markers carefully. Failing to do so, has the potential to add a great deal of unnecessary complication and expense to health care treatments. CA-125 by itself is a poor indicator of ovarian cancer and medical decisions based upon the sole use of such an indicator can end up subjecting patients to unnecessary pain and suffering. Consider also, the time and money that was wasted for the unnecessary medical treatement of 1080 women with false positive test results in the study cited above, let alone the costs incurred managing the severe surgical complications suffered by 150 of these women as a result of this unnecessary treatment. To be fair however, the shortcomings of CA-125 as a diagnostic marker are now well recognized and the current standard of practice recommends the use of CA-125 with other indicators such as sonograms and pelvic exams, all of whose combined results are more reliable than the diagnostic use of CA-125 alone.

This CA-125 story also highlights the urgent need for more reliable biomarkers that can be used in the early detection of diseases like ovarian cancer. The statistical probabilities that I used were actually taken from the "best case" scenario (in terms of predictive accuracy) for the use of CA-125 to diagnose ovarian cancer. I used the values observed for women with stage II or later disease in which CA-125 levels are typically more elevated but unfortunately the disease is harder to treat. Had I used instead, the values for women with earlier stage I disease where the treatment options and prognosis are better, the true positive rate for diagnosis drops from around 80% to around 50% and the probability of actually having the disease given a positive test result, drops to almost 1 in 7.

Incidentally, if you think I was exaggerating about the naivety of people's interpretation of biomarker statistics, where for example, a test that detected 80% of cases for diseased patients was equated in peoples' minds with an 80% probability of having the disease if the test is positive - well unfortunately I was not. In repeated studies, it has been consistently shown that even the majority of physicians, whose job it is to interpret these kinds of statistical results for their patients, struggle with their interpretation, generally ascribing more confidence to their conclusions from them, than is actually due.

The intuitive 'algorithm' that we used above to determine the probability of an event (the patient has a disease), given some prior evidence (the patient tested positive for the disease), can be captured more formally in an equation. The formal description of Bayes' Theorem, is typically presented as an equation of the form:


In the equation above, the syntax p(B|A) denotes the conditional probability of outcome B given outcome A. If we plug in the same numbers that we used in our intuitive approach in order to re-calculate the probability of having the disease given a positive test result, they look like this:

p(disease | positive) = p(positive | disease) * p(disease) / p(positive)

note that p(positive) is the total probability for all of the circumstances under which a positive test could occur - in our case, it is the sum of the probabilities for getting a positive test with and without the disease.

p(positive) = p(positive | disease) * p(disease) + p(positive | no disease) * (no disease)

p (positive) = 0.8 * 0.015 + 0.04 * 0.985 = 0.0514

therefore: p(disease | positive) = 0.8 * 0.015 / 0.0514 = 0.233 which corresponds to the 23% probability we arrived at using our intuitive approach.

As life scientists, the weighing of evidence is always an important component of our work. I hope that the example above makes it clear that in the case of the biomedical sciences at least, weighing the evidence naively can have the potential to be extremely costly and even life-threatening. In the life sciences, Bayes' Theorem has been successfully applied to a vast array of biological areas as diverse as bioinformatics and computational biology, next-generation sequencing, biological network analysis, and disease evolution and epidemiology to name but a very few examples.

The fundamentals of Bayes' Theorem are extremely easy to grasp, especially when dealing with the point probabilities and binary outcomes that were discussed here, but the applications of Bayes' Theorem are vast, not only in the life sciences but in any sphere of activity in which our beliefs and decisions are shaped by weighing the evidence.

  © The Digital Biologist | All Rights Reserved