When the algorithm was tested on a different batch of Mount
Sinai x-rays it performed admirably, accurately detecting pneumonia 93%
of the time. But .... also tested it on tens of thousands of patient
images from two other sites: the National Institutes of Health Clinical
Center in Bethesda and the Indiana Network for Patient Care. With
x-rays from those locations—where pneumonia rates just squeaked past
1%—the success rate fell, ranging from 73% to 80%, the team reported
last year in PLOS Medicine. “It didn't work as well because the patients at the other hospitals were different,” ... says. At
Mount Sinai, many of the infected patients were too sick to get out of
bed, and so doctors used a portable chest x-ray machine. Portable x-ray
images look very different from those created when a patient is standing
up. Because of what it learned from Mount Sinai's x-rays, the algorithm
began to associate a portable x-ray with illness. It also anticipated a
high rate of pneumonia, boosting misdiagnoses.
This is not that unexpected. The AI technique, most likely a neural network, uses many data point but no knowledge. It is akin to my argument that if Newton had used AI to determine gravity it may very well have included some metric including the color of the Kings undergarments! The lesson to be learned is that any AI neural network must be trained on the right parameters not just everything and AI has not yet reached to stage where it can independently determine those parameters.