Wednesday, May 29, 2013

Chaotic Data or You Just Do Not Know Enough

In a recent Scientific American piece the author bemoans the wealth of unorganized data available regarding cancers.

The author bemoans:

The field of genetics has flourished with the publishing of the complete human genome in 2001, aided by the advent of fast, affordable sequencing technology. A completed genetic code of a healthy person allows us to compare against the genetics of cancer. With advanced analytical techniques, and decades of research into the characteristics of different forms of this disease, it seemed that it was finally time to pull out the answers from the code itself by looking for the mutations that cause or support the cancer’s growth – the differences between the cancer cell and a normal cell. But when the answers didn’t bubble up from our statistics and reams of data, it became clear that the questions left for us were far more complicated. 

Yes indeed the questions are complex. But we do have  examples of previous work to guide us. Cancer is an amalgam of ligands, receptors, pathway elements, transcription factors, SNPs, miRNAs, and the ECM environment. Cancer cells move, mutate, and affect and are affected by their local environment. It is a stochastic distributed dynamic system. Fortunately we have been studying these for decades, if not for most of the past century. The issue is to understand what drives what and how to control it.

Cancer is a stochastic distributed temporally varying field, cells changing and flowing with ever changing genetic characteristics. We can understand these characteristics and even predict to some degree where they will go in the future. That is the challenge and the opportunity. The author seems to totally miss this point.

She continues:

One method for addressing chaos of this sort is to develop a statistical plan. We want to be able to take the information we know, look at the probabilities that certain changes will occur, and use statistics to determine what the cancer will likely do next. This challenge seems insurmountable when you look at the variables we must contend with: rapid evolution, unchecked growth, subtle migration, and so on. Changes that occur in the genome during cancer initiation and progression involve massive genetic rearrangements, damage, and mutation. 

This makes it difficult to distinguish between causes and consequences of the cancer. It would be far easier if each gene, or even each chromosome, carried out its business without interaction with the rest of the cell. But not only can changes in one area of the genome have profound direct and indirect effects on the expression of genes elsewhere, we now have evidence that cancer cells can send these activity-altering signals to other cells, both tumor-derived and normal.

 Frankly a "statistical plan" is the worst path to go. You most likely can correlate anything with anything. Economists do it all the time and look where they have taken us. We know of can know what affects what and how that occurs. We can and must stipulate the system relationships. We must have a system model NOT some set of correlations.

This I fear is the difference between the bench worker, the physician, and the engineer. Is cancer genetics ready for the engineer? I believe that the door for such an entry is now opening. Not everything is in place but we may have the key pieces to begin the journey.

We know causes, we understand the consequences. If BRAF V600 is present we even know how to inhibit it, for a while. We must then understand how to take the next step. Cell by cell we must understand the genetic progression as a system, not as a set of correlations.