I have commented on the MIT 6.002X course and now I will get
a chance to comment on a Harvard one, PH207X, the statistics course from the
Harvard School of Public Health.
Now as before some bona fides. I have taught probability and
statistics in graduate level course at MIT from 1969 through 1975 and at GW University
from 1976 thru 1980. That is as I measure it some 11 years. Then I wrote my
first book in 1969 and published in 1974 on Stochastic Systems and State
Estimation (Wiley). I also published some fifty plus papers in this area as
well. Thus I may have a leg up. I also took a Board Review course at HMS back
in 1994 I believe which used the book by the instructor, good book, early
undergrad level, somewhat cook bookish but does the task. Thus like 6.002X I
come with some experience in the area.
I was again disappointed, left after the middle of the first
problem set. Now why? Simple, the student had to download a cumbersome
statistics package, figure out how to use it from some video by some person who
was using Valley Speak, and waving hands all about the place, and I never could
find the data set. This is another example of never getting to the material,
being encumbered by some generally useless piece of interference.
It is akin to the First Year Latin Instructor at some
expensive private school demanding that homework have been written with a blue
ball point, with names on the top right on each page in green ball point and
dates on the top right in red ball point and a staple ¾” from the top and ¾”
from the left, and parallel with the top of the paper. How about the Latin?
Form and not function. I saw the same problems in 6.002X but not to this extreme.
Why must a student waste so much time on acquiring and
learning some generally useless software package? It would have been much
better to have the student lean the theory and then using Excel for example
work through the analysis in detail.
In my experience for example performing an analysis of
variance process one learns a great deal by actually doing every step in the
analysis and not by using some software package that spits out the answer. Raw
data is what we deal with and the student should and must become familiar with
that data, must work with is, must live with it.
For example, dealing with outliers is a key issue. I wrote
an oft quoted paper on this back in 1975 I believe. What is an outlier and when
do we disregard it and when is the data in an outlier the most important data
element? You learn that only by dealing with all the data.
The instructor does teach a good course and his insights are
quite useful. He provides the novice with a window to statistics as used in the
Medical field. However he does not take the user to the extreme, nor is that
expected in such an introductory course. Thus getting bogged down in the first
step by some third party piece of software is really a waste of time.
It again begs the question of what is the whole purpose of
this venture. Will someone learn, perhaps, but would I ever teach this way,
never. I want the student to understand the principles and to work with the
data, to make the mistakes and to recover from them. I want to the student to
even go as far as saying that the wrong question was asked and the data may be
correct and properly analyzed but you asked the wrong question.
For example
take the NEJM studies on prostate cancer and PSA. They asked if performing PSA
measurements in some manner and using a threshold of 4.0 as a marker for say a
biopsy, did that result in the saving of lives. They concluded that it did not.
But the right question should have been and what procedure and
what data would result in a material change in mortality and morbidity? Perhaps
the answer was 2.0 and not 4.0 and perhaps the answer also required velocity
and percent free PSA as well as normalization on prostate volume. Namely what
was the right question and how do we develop tests to get answers to the right
question.
I come back again to what is the purpose of this course? To
learn how to use a software package or to learn how to employ statistical
analyses in the field of medical trials. I suspect it should have been the
latter.