Saturday, October 20, 2012

edX Redux


I have commented on the MIT 6.002X course and now I will get a chance to comment on a Harvard one, PH207X, the statistics course from the Harvard School of Public Health.

Now as before some bona fides. I have taught probability and statistics in graduate level course at MIT from 1969 through 1975 and at GW University from 1976 thru 1980. That is as I measure it some 11 years. Then I wrote my first book in 1969 and published in 1974 on Stochastic Systems and State Estimation (Wiley). I also published some fifty plus papers in this area as well. Thus I may have a leg up. I also took a Board Review course at HMS back in 1994 I believe which used the book by the instructor, good book, early undergrad level, somewhat cook bookish but does the task. Thus like 6.002X I come with some experience in the area.

I was again disappointed, left after the middle of the first problem set. Now why? Simple, the student had to download a cumbersome statistics package, figure out how to use it from some video by some person who was using Valley Speak, and waving hands all about the place, and I never could find the data set. This is another example of never getting to the material, being encumbered by some generally useless piece of interference.

It is akin to the First Year Latin Instructor at some expensive private school demanding that homework have been written with a blue ball point, with names on the top right on each page in green ball point and dates on the top right in red ball point and a staple ¾” from the top and ¾” from the left, and parallel with the top of the paper. How about the Latin? Form and not function. I saw the same problems in 6.002X but not to this extreme.

Why must a student waste so much time on acquiring and learning some generally useless software package? It would have been much better to have the student lean the theory and then using Excel for example work through the analysis in detail.

In my experience for example performing an analysis of variance process one learns a great deal by actually doing every step in the analysis and not by using some software package that spits out the answer. Raw data is what we deal with and the student should and must become familiar with that data, must work with is, must live with it.

For example, dealing with outliers is a key issue. I wrote an oft quoted paper on this back in 1975 I believe. What is an outlier and when do we disregard it and when is the data in an outlier the most important data element? You learn that only by dealing with all the data.

The instructor does teach a good course and his insights are quite useful. He provides the novice with a window to statistics as used in the Medical field. However he does not take the user to the extreme, nor is that expected in such an introductory course. Thus getting bogged down in the first step by some third party piece of software is really a waste of time.

It again begs the question of what is the whole purpose of this venture. Will someone learn, perhaps, but would I ever teach this way, never. I want the student to understand the principles and to work with the data, to make the mistakes and to recover from them. I want to the student to even go as far as saying that the wrong question was asked and the data may be correct and properly analyzed but you asked the wrong question. 

For example take the NEJM studies on prostate cancer and PSA. They asked if performing PSA measurements in some manner and using a threshold of 4.0 as a marker for say a biopsy, did that result in the saving of lives. They concluded that it did not.

But the right question should have been and what procedure and what data would result in a material change in mortality and morbidity? Perhaps the answer was 2.0 and not 4.0 and perhaps the answer also required velocity and percent free PSA as well as normalization on prostate volume. Namely what was the right question and how do we develop tests to get answers to the right question.

I come back again to what is the purpose of this course? To learn how to use a software package or to learn how to employ statistical analyses in the field of medical trials. I suspect it should have been the latter.