Decades ago I spent a considerable amount of time trying to logically develop a basis for rejecting outliers. When collecting data oftentimes most of the results look like the random variations one would expect, except for those few outliers.
At the time I noted:
When data are taken in many experiments there often are points which bear no resemblance to the actual experiment being performed but arise from other, possibly undefined, sources. These data points are called outliers and the data analyst seeks ways to identify them and accurately reject them from the data base. Statisticians have developed many techniques for recognizing and rejecting outliers, but their approaches have usually been centered upon techniques where time variations in the measurements were absent.
Namely I was concerned about rejection where we had some underlying but possibly uncertain dynamic process governing the system. Outliers can be viewed in many ways:
1. Just a bad data point.
2. A data point which is one of a kind.
3. A data point which represents a significant system underlying it.
Now consider the many new therapies for treating cancers; immunotherapy and pathway control therapy. We often see say 20% of the patients are "cured". The other 80% regress to normal status and do not survive. Are the 20% outliers, and should be rejected or are they representative of a different dynamic system and we should really try to identify that system. Forty years ago I felt the latter. Namely we observe data and if it divides into two distinct terminal states then we have two distinct underlying systems, pathways or immune responses, driving them there.
In Nature this week there is an interesting paper examining this issue. The authors note:
By definition, exceptional responses are rare, which makes them hard to
study. Their anecdotal nature seems to contradict the teachings on
statistically sound results in biomedical research. In a clinical trial,
even if there are several exceptional responders, a drug will fail to
achieve approval because it does not improve the health of the majority
of patients. This means there has been little incentive for researchers
or drug companies to investigate thoroughly why a few people respond so
well.
This has typically been the result that these outliers are just bad data. In reality they are good data, great data, but for another reason. Find that reason, namely identify the system that allows the patient to respond. If something works for even one patient, then it is not a failure but a success. Yet the statistical approach to clinical trials means we declare the trial a failure. That was NOT my approach 40+ years ago, and it should not be the case now. Somehow we seem to reject success, small as it may be, rather than ask why.
The Nature article continues:
Vincent Miller, a former MSKCC oncologist, agrees that views about
outliers are changing and thinks that many more such individuals might
be found. Any oncologist has a handful of patients in whom cancer just
melts away with no obvious explanation, says Miller, who is chief
medical officer of Foundation Medicine in Cambridge, Massachusetts, a
company that performs genomic analysis of samples from people with
cancer. In January, the pharmaceutical company Roche, based in Basel,
Switzerland, bought a majority stake in Foundation Medicine, which is
also involved in the ERI.
Yes, physicians must learn to think differently. They must go beyond answering what and how, diagnosis and treatment, and start asking why. We now have means and methods to assist in attaining the answer, they are tools from systems identification and analysis.
Finally the article states:
For research on outliers to be of greatest help, the outlier cases must
be rigorously selected. Only then can the analysis deliver sound results
despite the fact that it remains a profile of only one person, says
Friend. Taylor agrees, pointing out that molecular analysis of tumours
from patients is increasingly possible and that there is growing
acceptance of studying outlier patients. “Nevertheless,” he says, “it
requires that we stay focused on exploring the most significant outlier
responses to ensure the greatest return for patients.”
Yes selection helps but we must also use verifiable models of the disease or the putative curative process. If we have some immune response technique, then perhaps it is not a PD-1 problem, but some receptor we do not know yet. Find it, work into the shadows, and assume that it is there and then go looking.