In a recent article by Spector at Stanford the author tells how for little money one can develop their own
tests for genetic markers. She states:
So it takes years of
hard work and serious cash to create one of these “simple” tests, right? Not
anymore. “All you really need
is a computer browser and Excel,” says computer scientist Purvesh Khatri, PhD,
who, working with Atul Butte, MD, PhD, associate professor of systems medicine
in pediatrics, identified telltale chemicals (aka biomarkers) for three types
of cancer all in the span of one year. How was this possible?
By analyzing some of the vast amount of genetic information from tumor cell
samples that has been amassed over the past decade in free, publicly accessible
databases, and by outsourcing the lab work. “We say ‘outsource
everything except the genius,’” says Butte. “You come up with the question and
the target, and let everyone else do the work.” As Khatri walked me
through the discovery process, I learned there’s a little more to it than that.
Some work and cash is involved, not to mention high-school level biology. And
basic statistics will be a big help. But with those tools, skills and about
five days’ work, plus $4,000 to confirm through blood tests, you’re on your
way.
Yes, for just a few dollars and
a few hours of time you too can develop a genetic profile. In contrast a set of
papers by Detours and colleagues raises some doubts about this.
The problem is that it is all too easy to get correlations of almost anything with anything. They are not markers unless we have a system with verifiable causality. This was discussed in the work of Dougherty. What Dougherty has observed is that one must have a system underlying the process, with causality, and that what one then looks for are the coefficients which define that system. From that we can ascertain if the result is true and consistent.
Recently Detours has addressed this issue in PLOS and he and his co-authors have demonstrated that the plethora of markers for say breast cancer can be shown to be nothing more than almost random choices, my words not theirs. Namely one may be able to find correlates almost anywhere.
From the Detours note in The Scientist we have:
Ethic guidelines
drastically limit experiments on
human subjects. Hence, the fundamental mechanisms of human diseases are mostly
studied in vitro or in animal models. These are only substitutes for
understanding human physiology and disease. Proving that a mechanism
responsible for disease progression in a model system is also relevant to human
diseases—not to mention then translating it into a new therapeutic—is a major bottleneck in biomedicine. In
the end, only clinical interventions on human will bridge models and human
disease.
One approach is to
look for correlations. If you can show
that patients with tumors expressing, for example, stem cell markers have a
much worse prognosis than those without them, that would suggest that stem
cells are involved in human disease progression.
This line of thinking has long been popular in oncology because you need only
access surgical specimens, some mRNA or protein
marker, and a follow up of patients.
And with the recent advent of efficient microarray screens, this approach has become all the rage, reducing the
discovery of signatures, i.e. multi genes markers, to a nearly automatic
procedure.
In their PLOS paper Venet et al state:
Hundreds of studies in oncology have suggested the biological
relevance to human of putative
cancer-driving mechanisms with the following three steps:
1) characterize the mechanism
in
a
model system, 2) derive from the model system a marker whose expression changes when the mechanism is altered, and 3) show that
marker
expression correlates with
disease
outcome in patients—the last figure of such
paper is typically a Kaplan-Meier plot illustrating this correlation.
Detours continues:
The signatures’ prognostic potential can then be tested instantly in genome-wide compendia of
expression profiles for hundreds of human tumors, all available for free in the public domain. Besides stem cells
markers, signatures linked to all sorts
of biological mechanisms or states have been shown to be associated with human
cancer outcome. Indeed, several new signatures are published every month in prominent journals.
But such correlations
are not all that they seem. The accumulation
of signatures with all sorts of
biological meaning, but nearly identical prognostic values, already looked suspicious to us and others back in 2007.
It seemed that every newly discovered signature was prognostic. We collected from the literature some
signatures with as little connection to cancer as possible. We found, for example, a signature of the
blood cells of Japanese patients who were told jokes after lunch, and a
signature derived from the microarray analysis of the brains from mice that suffered social defeat. Both of these
signatures were associated with breast cancer outcome by any statistical
standards.
In PLOS they state:
Our study
questions
the biological interpretation of the prognostic value of published
breast cancer signatures, but has
no bearing on their usefulness
in the clinic: a marker may be accurate without yielding
interesting biological insight regarding the mechanism of disease
progression. Nevertheless, the prominence of proliferation should be taken into account in future clinical research.
Are there transcriptional signals in breast cancer
that are prognostic, but independent of proliferation?
And they conclude:
In conclusion, we have shown that 1) random
single- and multiple-genes expression markers have a high probability to be associated with breast
cancer
outcome; 2) most
published signatures are not significantly more associated with outcome than random predictors; 3) the meta-PCNA metagene integrates most of the outcome-related information contained in the breast
cancer transcriptome; 4) this information is present in over 50% of the transcriptome and cannot be removed by purging known cell-cycle genes from a signature.
As Detours concludes in his short piece in The Scientist:
It took us four years and six rejections to get this work finally published in a computational
biology journal (PLoS Comput Biol, 2011)—not the most efficient venue to reach the oncology community. Meanwhile, a steady stream of studies confounded by proliferation rates has appeared. This has to be said, one can no longer stay silent about the rather limited self-correction capability of the top tier publishing system (Cell, Nature Genetics, PNAS, etc.), which promoted these studies in the first place. The oncogenomic-based literature has forgotten the pitfalls of non-specific effects and the value of negative controls. It is not enough to show that a signature is prognostic; biological conclusions may be drawn only if its prognostic value is specifically driven by the mechanism/state under investigation. Importantly, we question prognostic signatures as specific research tools, not as clinical guides: smoke does not drive fire, yet it is powerful indicator of when and where a fire is burning.
biology journal (PLoS Comput Biol, 2011)—not the most efficient venue to reach the oncology community. Meanwhile, a steady stream of studies confounded by proliferation rates has appeared. This has to be said, one can no longer stay silent about the rather limited self-correction capability of the top tier publishing system (Cell, Nature Genetics, PNAS, etc.), which promoted these studies in the first place. The oncogenomic-based literature has forgotten the pitfalls of non-specific effects and the value of negative controls. It is not enough to show that a signature is prognostic; biological conclusions may be drawn only if its prognostic value is specifically driven by the mechanism/state under investigation. Importantly, we question prognostic signatures as specific research tools, not as clinical guides: smoke does not drive fire, yet it is powerful indicator of when and where a fire is burning.
His point is well taken. The challenge is to determine the intra-cellular and inter-cellular pathways as defined as dynamic distributed systems, and to do what Dougherty and others suggest, namely understand what is happening and why and then seek to identify the system. Failure to have a viable a provable model of the system will lead to volumes of data which are far from prognostic. In fact they may be very well deadly to the patient.