From time to time one reads about the errors induced in Government programs. The recent report that NCI has removed all of the historical data due to data entry problems, that is about 45 years worth of data, is truly amazing. Here is what NCI states:
The results for this registry-based evaluation in two SEER registries
confirmed that the PSA values were often incorrectly reported based on
an implied decimal in that data field. Following current reporting
guidelines for cancer registry data, PSA is coded in a 3-digit field
with an implied decimal between the second and third digits. For
example, a PSA of 4.0 ng/ml should be coded as 040. In both the study
described above and in the SEER registry’s evaluation of their data, it
was noted that some registrars were confused with proper use of the
implied decimal. For example, this resulted in coding a PSA of 4.0
ng/ml incorrectly as 004. The error rate for the SEER data was lower
than that seen in the original study and was approximately 17%. The
likely reason that the error rate was lower was a reflection of the
ongoing quality activities that routinely occur at each of the SEER
registries as data are submitted.
The core rule in entering data is to avoid ambiguity, yet expect it and check. Now we have many studies which have used this data and then make regulations based upon it. Perhaps the PSA rules mandated by the USPTF should be not only reconsidered but totally abandoned!
They conclude by stating:
We are currently developing a protocol that will be applied by all SEER
registries to further assess the error rate and allow the registries
to correct PSA values in recent years. As part of that protocol we will
determine whether we can use statistical methodology to correct PSA
from prior years. Once we have corrected the data, we will repost the
PSA corrected values and make those available to researchers.
Anyone who has ever done a set of questions for a database knows the possibility of mis-interpretation. I see it all the time. Recently I examined a Columbia Medical Center set of questions that not only were ambiguous but flawed. They made no sense. But did that stop anyone, no!
Clearly the 040 or 4.0 could have been avoided by being clear, namely x.y, two fields, with an error check and a feedback in red restating the value. Frankly the required entry is truly confusing, and it probably costs us millions to design!