Tuesday, May 19, 2015

PSA, Tax Dollars, and the Government

From time to time one reads about the errors induced in Government programs. The recent report that NCI has removed all of the historical data due to data entry problems, that is about 45 years worth of data, is truly amazing. Here is what NCI states:

The results for this registry-based evaluation in two SEER registries confirmed that the PSA values were often incorrectly reported based on an implied decimal in that data field. Following current reporting guidelines for cancer registry data, PSA is coded in a 3-digit field with an implied decimal between the second and third digits. For example, a PSA of 4.0 ng/ml should be coded as 040. In both the study described above and in the SEER registry’s evaluation of their data, it was noted that some registrars were confused with proper use of the implied decimal. For example, this resulted in coding a PSA of 4.0 ng/ml incorrectly as 004. The error rate for the SEER data was lower than that seen in the original study and was approximately 17%. The likely reason that the error rate was lower was a reflection of the ongoing quality activities that routinely occur at each of the SEER registries as data are submitted.

The core rule in entering data is to avoid ambiguity, yet expect it and check. Now we have many studies which have used this data and then make regulations based upon it. Perhaps the PSA rules mandated by the USPTF should be not only reconsidered but totally abandoned!

They conclude by stating:

We are currently developing a protocol that will be applied by all SEER registries to further assess the error rate and allow the registries to correct PSA values in recent years. As part of that protocol we will determine whether we can use statistical methodology to correct PSA from prior years. Once we have corrected the data, we will repost the PSA corrected values and make those available to researchers. 

Anyone who has ever done a set of questions for a database knows the possibility of mis-interpretation. I see it all the time. Recently I examined a Columbia Medical Center set of questions that not only were ambiguous but flawed. They made no sense. But did that stop anyone, no!

Clearly the 040 or 4.0 could have been avoided by being clear, namely x.y, two fields, with an error check and a feedback in red restating the value. Frankly the required entry is truly confusing, and it probably costs us millions to design!