[ad_1]
In a first warning about the role of artificial intelligence in understanding critical health data, a team of US researchers said AI in the medical field needed to be thoroughly tested for its performance in a wide range of populations. models can fall short.
The results should give pause to those considering rapidly deploying AI platforms without rigorously evaluating their performance in real-world clinical environments reflecting the location where they are deployed, observed Team from the Icahn School of Medicine at the Mount Sinai School of Medicine.
According to a study published in a special issue of PLOS Medicine on machine learning and health care, AI tools trained to detect pneumonia on chest X-rays have dramatically decreased their performance.
These results suggest that deep learning models may not work as accurately as expected.
"In-depth learning models trained to perform medical diagnoses may well generalize, but this can not be taken for granted because patient populations and imaging techniques differ significantly from one to another." institution to the other, "said Eric Oermann, senior author, professor of neurosurgery at Icahn's Mount Sinai Medical School.
To reach this conclusion, the researchers evaluated how the AI models identified pneumonia in 158,000 chest X-rays in three medical institutions: the National Institutes of Health, Mount Sinai Hospital and the US Department of Health. Indiana University Hospital.
In three out of five comparisons, the performance of Convolutional Neural Networks (CNNs) in the diagnosis of X-ray diseases from hospitals outside its own network was significantly lower than that of X-rays in the initial health system.
However, the CNNs were able to detect the hospital system in which a radiograph had been acquired with a high degree of accuracy, and cheated during their predictive task based on the prevalence of pneumonia in the training facility.
"If AI systems are to be used for medical diagnosis, they must be designed to address clinical issues, be tested for a large number of real-world scenarios, and carefully evaluated to determine their impact on a diagnosis." accurate, "explained John Zech, first author of the study. .
Source link