Sunday, April 5, 2020

Data and Data

As we have been arguing here the data we get to track is very noisy, even biased. Namely the reporting is at best problematic because the underlying testing is not being done and in addition the collection and dissemination of the data is grossly inconsistent and prone to errors. At the NY Times has reported:

Across the United States, even as coronavirus deaths are being recorded in terrifying numbers — many hundreds each day — the true death toll is likely much higher. More than 9,100 people with the coronavirus have been reported to have died in this country as of this weekend, but hospital officials, doctors, public health experts and medical examiners say that official counts have failed to capture the true number of Americans dying in this pandemic, as a result of inconsistent protocols, limited resources and a patchwork of decision-making from one state or county to the next. In many rural areas, coroners say they don’t have the tests they need to detect the disease. Doctors now believe that some deaths in February and early March, before the coronavirus reached epidemic levels in the United States, were likely misidentified as influenza or only described as pneumonia.

We do, however, have a multitude of statistical tests to ascertain the validity of the data. The problem, however, is the basic reporting. We rely upon state entities and in New Jersey at the county level it seems reasonable. However in New York it is poorly administered in my opinion so one cannot do there what we have been doing here.

Noisy data may be in error in a variety of ways. Means may be off and thus a bias. There may be large standard deviations on data, temporal variances, and the like. 

The major problem is that we are using this, hopefully but unlikely, to adapt the models which are being used to demolish the economy. Over reach is always a concern.