…science concerns relations between measurable variables and it is these relations that constitute the subject matter of science, scientific knowledge ipso facto is mathematically constituted…
Let me give a couple of examples of how this applies.
First, let us look at the world of genomics which I have
been discussing herein for a while. The introduction of the microarray has
allowed an explosion of data that has then allowed scientists to putatively
argue some relationship between genes and cancers. Namely they go about
examining say 9,000 prostate cancer patients and using microarrays primed for
say 500 genes they conclude that say some 50 of these gene are seen in prostate
cancer. They then allege that there is some actionable clinical relationship
between the presence of the gene and the cancer. There is no underlying system
model identifying this, just a microarray demonstrating that “oftentimes” these
genes are under or over expressed.
Second, let us look at the BRAF V600 melanoma cases. Here
unlike the above we have a case where one knows the RAF pathway and that loss
of control of certain elements of that pathway lead to gene instabilities and
thus a malignant expression. Therefore one targets the mutated RAF gene, the
BRAF V600, and it results in a suppression of the malignancy, for a while. Then
we had squamous cell carcinomas, but since the full pathway was known, go down
one step and there was MEK and controlling it controlled the sequella. In this
case there was a model, a system, and by logically following the system one
found what the next step should be.
The above are two examples of how “science” is being done
today in the area of gene related results. The second example is a Dougherty
like science, namely it connects data to an underlying model which is
predictable, and by using that the cancer is controllable, at least until another
instability results. The first model, data collecting, is not really science as
we accept it today. It is more akin to 19th century Botany, at best,
where one goes out and collects specimens of plants and then tries to sew
together a quilt of understanding to explain nature.
What Dougherty is focusing on is the Why question. When I
recall Medical School, one is taught What and How. What disease is it and How
do I treat it. In contrast Engineering is first Why and then How. There is a
strong dissonance when an Engineer is studying Medicine. At least forty years
ago. An Engineer all too often keeps asking Why, what is the underlying set of
basic scientific principles that explain the phenomenon and how can I express them
in a manner in which they can be used on a predictable basis. Why would drive
many a Medical Professor to apoplexy. Medicine was for a long while the
transfer down of “facts” and not validatable principles. The old adage at
graduation that fifty percent of what one had just learned in Medical School
was now invalid was a bit of a joke but sadly it was also true.
But as we move to Genomics we sadly see this trait arise
again. There is a tension between those who want to have basic repeatable
principles to build upon and those who believe that collecting data is the sine
qua non. Let me give an example of a recent experience. Prof Lander at MIT is
teaching an EdX course on Biology. Now Lander is brilliant and his style of
teaching is in many ways classic MIT. Namely he highlights the basic principles,
and then the student works through the Problem Sets developing the details for
themselves. So far so good. His first two three fourths of the course was
fantastic. Then I noticed a subtle change, a change that, unless you were
prepared to recognize would have slipped through the cracks. He slowly started
giving a mixture or core predictable principles and cook book recipes. For
example, we know that we can denature DNA because the base pair bonds are
Hydrogen bonds, relatively weak, and the backbone Phosphate bonds are strong
because they are ionic. Thus by heating the molecule we break the Hydrogen
bonds first and then before we break the ionic bonds we can do our
complementary additions, thus PCR works well.
On the other hand as he progressed to a discussion of Knock
Out genes there were a collection of “tricks” or cook book recipes that were
used. Why, for example did one get the modified DNA into the denatured gene the
way he said? Well it just happens. Well nothing just happens. Fortunately bench
Biologists have developed many “tricks”, like alchemists, and as a result they
have become a bit too comfortable with this unexplained bevy of tools, albeit indispensable,
but in the long run self-defeating.
As Dougherty states when he examines data mining as an
example of the Biologist’s flair for data at all costs:
Data mining and Copernicus share a lack of experimental design; however, in contradistinction to data mining, Copernicus thought about unplanned data and changed the world, the key word being ‘thought.’ Copernicus was not an algorithm numerically crunching data until some stopping point, very often with no adequate theory of convergence or accuracy. Copernicus had a mind and ideas. William Barrett writes, ‘The absence of an intelligent idea in the grasp of a problem cannot be redeemed by the elaborateness of the machinery one subsequently employs’. Or as M. L. Bittner and I have asked, ‘Does anyone really believe that data mining could produce the general theory of relativity’? Data mining represents a regression from the achievements of three and a half centuries of epistemological progress to a radical empiricism, in regard to which Reichenbach writes, ‘A mere report of relations observed in the past cannot be called knowledge. If knowledge is to reveal objective relations of physical objects, it must include reliable predictions. A radical empiricism, therefore, denies the possibility of knowledge’. A collection of measurements together with statements about the measurements is not scientific knowledge, unless those statements are tied to verifiable predictions concerning the phenomena to which the measurements pertain.
Data mining and Copernicus share a lack of experimental design; however, in contradistinction to data mining, Copernicus thought about unplanned data and changed the world, the key word being ‘thought.’ Copernicus was not an algorithm numerically crunching data until some stopping point, very often with no adequate theory of convergence or accuracy. Copernicus had a mind and ideas. William Barrett writes, ‘The absence of an intelligent idea in the grasp of a problem cannot be redeemed by the elaborateness of the machinery one subsequently employs’. Or as M. L. Bittner and I have asked, ‘Does anyone really believe that data mining could produce the general theory of relativity’? Data mining represents a regression from the achievements of three and a half centuries of epistemological progress to a radical empiricism, in regard to which Reichenbach writes, ‘A mere report of relations observed in the past cannot be called knowledge. If knowledge is to reveal objective relations of physical objects, it must include reliable predictions. A radical empiricism, therefore, denies the possibility of knowledge’. A collection of measurements together with statements about the measurements is not scientific knowledge, unless those statements are tied to verifiable predictions concerning the phenomena to which the measurements pertain.
What is Dougherty getting at? Simply, to reiterate the first
quote: Science demands a marriage between data and models, to be true science
it must be predictable and predictable based upon an embodiment in an
abstraction.
Let me now apply this to genomics. Consider prostate cancer.
The question is complex but can be asked; what is the first set of steps that
lead to prostate cancer? Let us examine what we know:
First, we know many of the pathways. We know that the AKT pathway
is critical, we know that c-MYC is a critical control element, we know that
PTEN is often mutated, and we know that AR (Androgen Receptors) ultimately get
mutated and we have metastatic growth. We pathways, we have relationships; we
can demonstrate causality and results. Thus a modicum of a basis in reality
exists. If one would use this pathway model and then search using microarrays
matched against the model one arguable could iterate to improved models and
improved predictability. The data without the model is useless and the model
without the data is unverifiable.
Second, we can ask what sets the process off. Are all the
changes due to mutations or more likely due to epigenetic insults? Thus when we
look at MDS for example, we are looking at a hypermethylated set of blood stem
cells. Something hypermethylated them and we know that since they are
hypermethylated that the gene expression is repressed and thus cell
proliferation of immature cells is a result. In prostate cancer, is the control
mechanism lost because of a mutation, methylation, both, and in what order?
Having a model allows one to validate and then iterate along a consistent trajectory
of reality.
What does Dougherty have to say here?
While ignorance of basic scientific method is a serious problem, it is necessary to probe further than simply methodological ignorance to get at the full depth of the educational problem. Science does not stand alone, disjoint from the rest of culture. Science takes place within the general human intellectual condition. Biology cannot be divorced from physics, nor can either be divorced from mathematics and philosophy. One’s total intellectual repertoire affects the direction of inquiry: the richer one’s knowledge, the more questions that can be asked. Schrodinger comments, ‘A selection has been made on which the present structure of science is built. That selection must have been influenced by circumstances that are other than purely scientific’
While ignorance of basic scientific method is a serious problem, it is necessary to probe further than simply methodological ignorance to get at the full depth of the educational problem. Science does not stand alone, disjoint from the rest of culture. Science takes place within the general human intellectual condition. Biology cannot be divorced from physics, nor can either be divorced from mathematics and philosophy. One’s total intellectual repertoire affects the direction of inquiry: the richer one’s knowledge, the more questions that can be asked. Schrodinger comments, ‘A selection has been made on which the present structure of science is built. That selection must have been influenced by circumstances that are other than purely scientific’
The point I believe he is making is that in the new world of
Genomics, it is necessary to have a foundation that exceeds just the Laboratory
and its tricks. One must understand that no matter what we think that every
time we look at a cell, at an organism, we are looking at a system, at some
stochastic dynamical process wherein things move forward, albeit randomly, but
in a way controlled by principles. We must look at the world wherein data is
used not as an end in itself but as an iterative process with our mathematical
world view. Thus the tools needed to view this world are extensive yet
available. Engineers are trained to use them daily. Perhaps Genomics will grow
to appreciate their essential import.