The Squirrel's Nest: Microarrays and Too Much Data

Tuesday, December 20, 2011

Microarrays and Too Much Data

In a recent article by Spector at Stanford the author tells how for little money one can develop their own tests for genetic markers. She states:

So it takes years of hard work and serious cash to create one of these “simple” tests, right? Not anymore. “All you really need is a computer browser and Excel,” says computer scientist Purvesh Khatri, PhD, who, working with Atul Butte, MD, PhD, associate professor of systems medicine in pediatrics, identified telltale chemicals (aka biomarkers) for three types of cancer all in the span of one year. How was this possible? By analyzing some of the vast amount of genetic information from tumor cell samples that has been amassed over the past decade in free, publicly accessible databases, and by outsourcing the lab work. “We say ‘outsource everything except the genius,’” says Butte. “You come up with the question and the target, and let everyone else do the work.” As Khatri walked me through the discovery process, I learned there’s a little more to it than that. Some work and cash is involved, not to mention high-school level biology. And basic statistics will be a big help. But with those tools, skills and about five days’ work, plus $4,000 to confirm through blood tests, you’re on your way.

Yes, for just a few dollars and a few hours of time you too can develop a genetic profile. In contrast a set of papers by Detours and colleagues raises some doubts about this.

The problem is that it is all too easy to get correlations of almost anything with anything. They are not markers unless we have a system with verifiable causality. This was discussed in the work of Dougherty. What Dougherty has observed is that one must have a system underlying the process, with causality, and that what one then looks for are the coefficients which define that system. From that we can ascertain if the result is true and consistent.

Recently Detours has addressed this issue in PLOS and he and his co-authors have demonstrated that the plethora of markers for say breast cancer can be shown to be nothing more than almost random choices, my words not theirs. Namely one may be able to find correlates almost anywhere.

From the Detours note in The Scientist we have:

Ethic guidelines drastically limit experiments on human subjects. Hence, the fundamental mechanisms of human diseases are mostly studied in vitro or in animal models. These are only substitutes for understanding human physiology and disease. Proving that a mechanism responsible for disease progression in a model system is also relevant to human diseases—not to mention then translating it into a new therapeutic—is a major bottleneck in biomedicine. In the end, only clinical interventions on human will bridge models and human disease.

One approach is to look for correlations. If you can show that patients with tumors expressing, for example, stem cell markers have a much worse prognosis than those without them, that would suggest that stem cells are involved in human disease progression. This line of thinking has long been popular in oncology because you need only access surgical specimens, some mRNA or protein marker, and a follow up of patients. And with the recent advent of efficient microarray screens, this approach has become all the rage, reducing the discovery of signatures, i.e. multi genes markers, to a nearly automatic procedure.

In their PLOS paper Venet et al state:

Hundreds of studies in oncology have suggested the biological relevance to human of putative cancer-driving mechanisms with the following three steps: 1) characterize the mechanism in a model system, 2) derive from the model system a marker whose expression changes when the mechanism is altered, and 3) show that marker expression correlates with disease outcome in patients—the last figure of such paper is typically a Kaplan-Meier plot illustrating this correlation.

Detours continues:

The signatures’ prognostic potential can then be tested instantly in genome-wide compendia of expression profiles for hundreds of human tumors, all available for free in the public domain. Besides stem cells markers, signatures linked to all sorts of biological mechanisms or states have been shown to be associated with human cancer outcome. Indeed, several new signatures are published every month in prominent journals.

But such correlations are not all that they seem. The accumulation of signatures with all sorts of biological meaning, but nearly identical prognostic values, already looked suspicious to us and others back in 2007. It seemed that every newly discovered signature was prognostic. We collected from the literature some signatures with as little connection to cancer as possible. We found, for example, a signature of the blood cells of Japanese patients who were told jokes after lunch, and a signature derived from the microarray analysis of the brains from mice that suffered social defeat. Both of these signatures were associated with breast cancer outcome by any statistical standards.

In PLOS they state:

Our study questions the biological interpretation of the prognostic value of published breast cancer signatures, but has no bearing on their usefulness in the clinic: a marker may be accurate without yielding interesting biological insight regarding the mechanism of disease progression. Nevertheless, the prominence of proliferation should be taken into account in future clinical research. Are there transcriptional signals in breast cancer that are prognostic, but independent of proliferation?

And they conclude:

In conclusion, we have shown that 1) random single- and multiple-genes expression markers have a high probability to be associated with breast cancer outcome; 2) most published signatures are not significantly more associated with outcome than random predictors; 3) the meta-PCNA metagene integrates most of the outcome-related information contained in the breast cancer transcriptome; 4) this information is present in over 50% of the transcriptome and cannot be removed by purging known cell-cycle genes from a signature.

As Detours concludes in his short piece in The Scientist:

It took us four years and six rejections to get this work finally published in a computational
biology journal (PLoS Comput Biol, 2011)—not the most efficient venue to reach the oncology community. Meanwhile, a steady stream of studies confounded by proliferation rates has appeared. This has to be said, one can no longer stay silent about the rather limited self-correction capability of the top tier publishing system (Cell, Nature Genetics, PNAS, etc.), which promoted these studies in the first place. The oncogenomic-based literature has forgotten the pitfalls of non-specific effects and the value of negative controls. It is not enough to show that a signature is prognostic; biological conclusions may be drawn only if its prognostic value is specifically driven by the mechanism/state under investigation. Importantly, we question prognostic signatures as specific research tools, not as clinical guides: smoke does not drive fire, yet it is powerful indicator of when and where a fire is burning.

His point is well taken. The challenge is to determine the intra-cellular and inter-cellular pathways as defined as dynamic distributed systems, and to do what Dougherty and others suggest, namely understand what is happening and why and then seek to identify the system. Failure to have a viable a provable model of the system will lead to volumes of data which are far from prognostic. In fact they may be very well deadly to the patient.

Cookies EU

Notice: This site is written by me and operated by or under the aegis of Google via Blogspot and May Contain Cookies. This notice should suffice as a warning which may be required by EU regulations. Then again, it is the EU after-all and this may not reflect the regulated reality but at least you have been advised. Copyright 2008-2026 Terrence P McGarty All Rights Reserved, also quotes are from referenced materials and are used under a Fair Use principle as they are commented upon and linked back to.

NOTICE

All documents and materials on this web site are the copyrighted property of Terrence P McGarty, the "Author", and can be used solely for individual purposes. The Author also does not represent in any manner or fashion that the documents and information contained herein can be used other than for expressing the opinions of the Author. Any use made and actions resulting directly or otherwise from any of the documents, information, strategies, or data or otherwise is the sole responsibility of the user and The Author expressly takes no liability for any direct or indirect losses resulting from the use or reliance upon any of the Author's opinions as herein expressed. There is no representation by The Author, express or otherwise, that the materials contained herein are investment advice, business advice, legal advice, medical advice or in any way should be relied upon by anyone for any purpose. The Author does not provide any financial, investment, medical, legal or similar advice on this website or in its publications or related sites. Also the author may, from time to time, include items from other publications on the reliance of "Fair Use" principles in Copyright Law and such items are used in the opinions stated, they are also referred to by source and delineated as coming from such third party sources. This Blog is not implemented in any manner to generate revenue or any other benefits and is solely for opinion and expression thereof. Any who have concerns about referred to comments should contact the Author for prompt remediation.

The Squirrel's Nest

Tuesday, December 20, 2011

Microarrays and Too Much Data

About Me

Key Connections

Blog Archive

Labels

Publications

Important Documents

Total Pageviews

Telmarc White Papers

Cookies EU

NOTICE

The Squirrel's Nest

Tuesday, December 20, 2011

Microarrays and Too Much Data

About Me

Key Connections

Blog Archive

Labels

Publications

Subscribe To

Important Documents

Total Pageviews

Telmarc White Papers

Cookies EU

NOTICE