This is the age of massive genome surveys — at least for
a little while longer. Sixty years after Watson and Crick's discovery, and a
decade after the completion of the Human Genome Project, large-scale sequencing
efforts directed at human disease abound, especially for cancer and rare
congenital syndromes. International research teams supported by public funding
agencies such as the National Institutes of Health and by private foundations
such as the Wellcome Trust are rapidly enlarging the catalogue of genetic
changes associated with neoplasia and other ailments, using ever faster, ever
cheaper sequencing methods and heavy-duty bioinformatics.
Critics of big genomics projects have argued that such
work is resource-intensive, is not hypothesis-driven, and amounts to little
more than molecular philately. But as discoveries from these projects stack up,
and as terabytes of observational data yield new insights into disease biology
and prompt the development of pathway-driven targeted therapies, the usefulness
of such approaches is becoming undeniable. When the Cancer Genome Atlas (TCGA)
wraps up its 8-year effort next year, it will have provided detailed
information on 10,000 cancer genomes for less than the cost of a trio of F-22
Raptor stealth fighters.
Let us examine where and where "not" models function. The issue we are interested in is that to develop models and substantiated inferences we need a well understood reality for cause and effect. Namely gravity causes a force between two masses, it is a measured effect. Charge causes a force between two particles, it is a measured effect. In gene structures PTEN modulates PI3K, BRAF activates MEK and the pathways are well known. We know ligands, we know receptors, we know gene activators.
We also understand how these function, the forces, charges, conformations. We know how to inhibit and activate. We know these by facts not by correlations. Thus when we know these facts we have a basis for, indeed a demand for, using these dynamic models as an integral part of our understanding. We cannot and must not resort to random correlations. At times everything can be correlated.
Let me first explain by examining what human intellect has developed using models like this and then examine those thing where our knowledge is severely lacking. The driver for this is that genomics is more akin to physical systems which we can do a great deal with and NOT akin to Economics and its correlations which frankly led us nearly to an economic collapse.
Inherent in each of the areas where the use of the knowledge of relationships is integral to the descriptions, and not correlations, is also simple yet high level descriptives; Schroedinger's Equation, Navier Stokes Equation, Fokker Plank Equation, Kushner Stratonovich Equation. We have argued elsewhere that for genomic systems we are almost already there, just a few more steps.
Where They Work:
Thermodynamics
Early 19th century thermodynamics did not
understand the true nature of heat, namely the movement of molecules and the
statistical behavior. From the gross concepts we arrived at such things as
internal energy, heat, enthalpy and other gross measures of a system’s
thermodynamic properties. With the development of statistical mechanics there
was the move from gross measurement to understanding the statistical
distribution of the molecules and this was presented via the Fokker Planck
equation, a means to examine the detailed statistical fine structure of complex
mixtures with thermodynamic issues.
Fluid Flow
Understanding fluid dynamics was initially a study with many
tables and data taken from past experiments. Slowly as the theory evolved the
Navier Stokes theory came forth and constructs such as flow fields evolved and
then random changes in turbulence theory also was developed.
Stochastic Dynamic Systems
Complex systems, namely entities which are based on physical
realities, aircraft control systems, chemical process controls, and the like
can be modeled by a complex multidimensional spatio temporal model. Applying
statistical methods developed by Wiener and Kolmogoroff one saw the development
of the Kushner Stratonovich equations and then saw them extended to distributed
system as well. This analysis allows one to analyze highly complex
multidimensional stochastic systems in time and space.
Wiener considered these in his studies of Cybernetics, and
furthermore it was Wiener who also started the development of understanding
complex organic systems.
The most important elements in using system models is our
ability to ascertain Observability and Controllability; the ability to
reproduce the model and the ability to send the states in the model to a
desired end point. We also need to have the ability to identify the coefficients
in the model. We often have good initial guesses but having measurement means
that we can continually refine them
Electronics
When the transistor was invented the manager of the people
who did it promptly published a book on solid state theory. Very few had a clue
as to what Shockley was saying and frankly for those who used the device no one
really cared. The electronics designer knew the linkages, and how to modify
the, A good electronics designer knew that if this voltage went up the other
went down, or whatever.
One knew the equations of the voltages and currents, one
understood the complexities of the time varying circuits. There was a
substantial amount of well proved physics behind all of this. However a good
engineer also understood the ebb and flow, as for example a good neurologist
can be examining the patient understand where the lesion is and then find out
what the lesion is.
We are starting to get to that point with genomics. We know
if we activate a kinase receptor then we activate a certain set of pathways
which activate a certain set of transcription factors. A good Genomics Engineer
does not need to “know” the protein structures, just what they do, at a very
high level, yet detailed enough to catch the unique events which may occur.
Quantum Mechanics and Electrodynamics
Erwin Schroedinger came up with a simple manifestation of
electrons in a quantum world. Feynman came up with simple diagrams to show what
sub atomic particles are doing when they interact. Now solving the Schroedinger
equation for a complex organic molecule is not readily achieved it can be
attacked using sophisticated computers.
Where The Models do NOT Work:
Economics
Economics pretends to be mathematical. In reality, other
than the tautologies of balancing financial sheets, the demand and supply
theories are pure abject speculation. Econometrics is merely a fallacious use
of correlation theory. There is no fundamental cause and effect, no
demonstrable demonstration between input and output. This should be a warning
for those working in Genomics. Just having a correlation is not causality and
furthermore there is an underlying reality that is being ignored.
Social Sciences
The social sciences have tried to ascertain human responses.
Approaches such as those used in election prediction may function in the short
term but humans are all too often less than a herd and change opinions all too
frequently.
Psychology
Psychology is strewn with the dead bodies of mathematical
approaches to understanding human behavior. The problem is that we do not
understand the brain, yet, and thus models of thinking, such as those in
artificial intelligence are at best primitive guesses.
Fundamental Requirements
To have a Genomic Model or something useful we must have the
following:
1. A Verifiable Realization of How the System Works. This
entails the understanding of pathways and their effects on cell proliferation
and movement.
2. Some Understanding of What Causes Changes in Pathways.
Here we have a difficulty. We not only have somatic changes, but we have
epigenetic changes such as micro RNAs and even methylation and the like. We
also must understand germ line predispositions. These are gene predispositions
inherited as well as SNP predispositions which can result in subsequent translation
of proteins.
3. Environmental Understandings: This would include the
extracellular matrix issues and its environment as well as the ability of the
invading malignant cells to activate surrounding benign cells to assist in
proliferation.
4. An Acceptable Measure of the Malignant Cell in Space and
Time. As with the Fokker Planck equation or the Kushner Stratonovich model this
may mean a measure such as the average number of a specific type of malignant
cell per volume at each spatio-temporal location.
5. A Realization for the Progression of Somatic Changes:
This means an understanding of the statistical nature of the somatic genetic
changes as cells progress in time. For example what happens when we go from MDS
to AML. This is not AML de novo. We do not need the details but the transition
rates and the possible states. This issue is akin to the electronics world
where we need higher level understanding and not the details.
6. We Need Ability to Integrate Parameter Identification
with Stochastic Models: Clearly if we know the models and if we know that
certain factors are the drivers of these models, we may use this to identify
the parameters at the same time we are estimating the states.
7. We need a Methodology to Quantify Our Representations and
to validate them: This is akin to having a Kushner Stratonovich equation. It is
what we have developed by using average number of cells by type at specific
spatio-temporal sites. I believe we have the techniques, they are built on the
many other approaches.
The Genomic Model
The Genomic Model is a systematizeable model. It is s system
wherein we have well known causes and effects. We know that if a ligan attaches
to a receptor then one has an activated pathway that induces a transcription
factor which results in cellular proliferation. We know cause and effect. We
know the rates of these factors. We can also develop models wherein we can
estimate the average number of cells of a particular genetic structure at a
specific time and at a specific place in the body.
The NEJM article concludes:
In 1803, a few years before the inaugural issue of the
Journal, Thomas Jefferson commissioned Meriwether Lewis and William Clark to
survey the vast unknown American frontier. Lewis and Clark departed from St.
Louis, where Ley et al. initiated the AML genome survey. Less than a century
later, the western frontier was declared “closed,” but land surveyors did not
disappear; today, they focus on construction projects and property boundaries.
Likewise, although the initial epic AML genomic survey that began in St. Louis
is now largely complete and surveys of other neoplasms will soon conclude, the
use of genomics in quotidian practice is just beginning.
Now if I were to interpret this correctly it sends just the
wrong message. The developments in Genomics are not Lewis and Clark like, they
require Newtonian and Maxwellian insight. Fundamental laws and relationships,
causality, albeit stochastic, and determine ng the right measures.
Thus in a sense one could imagine the Genomics Engineer
being akin to the electronics engineer, or even to the neurologist. As many a
medical student would recall from anatomy, the tracing of the cranial nerves is
a critical skill, but one enhanced by seeing how they migrate from back to
front. Diagnosis of cranial nerve issues are resolved by understanding the network.
In a similar manner we would hope the same is double with Genomics pathways,