Artificial intelligence (AI) can detect oral cancer and other diseases with similar levels of accuracy as healthcare professionals, according to a new review published on September 24 in Lancet Digital Health. But the authors caution that dentists and physicians aren't in danger of being replaced by AI anytime soon.
Researchers analyzed the results of almost 70 studies using AI and found that the diagnostic performance of deep-learning models was slightly better than healthcare practitioners.
"Within those handful of high-quality studies, we found that deep learning could indeed detect diseases ranging from cancers to eye diseases as accurately as health professionals. But it's important to note that AI did not substantially outperform human diagnosis," stated co-author Alastair Denniston, PhD, from University Hospitals Birmingham National Health Services (NHS) Foundation Trust in the U.K., in a Lancet press release.
While artificial intelligence and deep learning have received considerable attention, the researchers wanted to find out if it is as accurate as healthcare professionals in classifying diseases using medical imaging.
They conducted an online search up to June 2019 for studies comparing the diagnostic performance of deep-learning models to that of healthcare professionals. Almost 31,600 studies were reduced to 69 that provided enough data. The researchers then sampled 25 of the studies and found that 14 made a comparison between deep-learning models and healthcare professionals in the same sample.
They reported that deep-learning models had a slightly higher sensitivity and specificity in the studies than healthcare professionals (see table below).
Deep learning vs. healthcare professionals for sensitivity and specificity | ||
Healthcare professionals | Deep-learning models | |
Pooled sensitivity | 86.4% | 87.0% |
Pooled specificity | 90.5% | 92.5% |
The review authors included one study on oral cancer and another on oral and maxillofacial cancer in their research. The only information reported was that the oral cancer study used contrast-enhanced CT while the oral and maxillofacial cancer study used x-rays.
The authors listed several study limitations, including that deep learning was frequently assessed in a way that does not reflect clinical practice and that few prospective studies were done in real clinical environments. They also noted that many of the studies did not report missing data, limiting the conclusions that can be drawn.
Ultimately, the review authors cautioned that the true diagnostic power of artificial intelligence remains uncertain because of the lack of studies that directly compare the performance of humans and machines or that validate AI's performance in actual clinical environments.
"Diagnosis of disease using deep-learning algorithms holds enormous potential. From this exploratory meta-analysis, we cautiously state that the accuracy of deep-learning algorithms is equivalent to healthcare professionals, while acknowledging that more studies considering the integration of such algorithms in real-world settings are needed," they concluded.