Eric, that is a great article. You’re making the uncomfortable point that needs to be made: when the best RCT evidence shows +29% cancer detection, no recall/false-positive penalty (1.4% in both arms), and −44% reading workload (MASAI), “AI as optional add-on” starts to look like institutional inertia, not clinical judgment.
What I keep coming back to is your line of logic in Section 2: once AI flags risk/density/suspicion, the pathway often tightens surveillance (US/MRI/shorter intervals). That’s where the next bottleneck shows up: follow-up ultrasound quality is still wildly variable, especially in dense tissue and outside major centers.
So yes, make AI-mammo the default. But the “new standard” should be end-to-end: AI-assisted detection + quality-locked follow-up imaging, otherwise we’re upgrading the front door and leaving the hallway dark.
Curious where you land on this: should “AI-mammo standard” be paired with a quality standard for ultrasound in the follow-up pathway?
We just published with Mayo Clinic in Mayo Clinic Proceedings: Innovations, Quality & Outcomes (retrospective, 62,912 breast US scans / 688 patients), showing that adding image-enhanced + quality-improved ultrasound representations materially improves downstream AI classification performance
Hi Eric. I must say, 20 years ago, we at Philips Research, used a combination of supervised machine learning and computer vision to serve as an adjunct for improving upon determining whether or not a solitary pulmonary nodule identified on Chest CT using Dr. Sol Katz's film radiograph criteria is either more likely to be benign or malignant. We were looking for a "better way" to leverage the soon to be released Lung Cancer Screening Trial data. Our application was reviewed favorably by the FDA, but was never commercialized due to political and business reasons. Two items of interest in your post. First, that the combination of machine learning (the core of AI) and computer vision can be of immense value for image-associated biomedical professions (radiology, pathology, ophthalmology, dermatology). Second, that the "obviousness" of such uses takes a very, very long time to validate and make part of daily medical practice. Eric
While AI in medical imaging holds tremendous promise, we should proceed with scientific rigor and historical humility. The enthusiasm is understandable, but we have been here before.
The MASAI trial is an important contribution, yet its screening context differs substantially from U.S. practice as noted by Dr Topol: In Sweden they have double reading, very low recall rates, and primarily 2D mammography. In contrast, U.S. screening typically involves single-reader interpretation, higher false-positive rates, and widespread use of 3D tomosynthesis. Generalizing across these fundamentally different systems requires caution.
We also must not forget the experience with computer-aided detection (CAD) in mammography. After FDA clearance in the late 1990s and rapid reimbursement approval, CAD support tools were widely adopted in the U.S. Early reader studies suggested improved accuracy with CAD, yet after dissemination into routine clinical practice, large-scale data showed no improvement. As shown in our 2007 NEJM paper, CAD use was associated with worse performance (J Fenton et al NEJM 2007). The human–computer interaction in real world clinical systems matters.
Most AI research focuses on algorithm development, but far less attention has been paid to how outputs are integrated into workflow, how clinicians respond to different formats and styles of algorithmic cues and how timing of being shown the AI report shapes decisions in real world settings. AI support tool reports are usually shown to radiologists before they view each exam, thus potentially altering the physicians’ viewing behavior and interpretation. Automation bias, the tendency to defer to algorithmic output, can alter physician judgment in unpredictable ways, particularly when computer prompts precede independent interpretation.
AI may transform screening, but if we fail to apply rigorous evaluation before widespread adoption, we risk repeating mistakes from the CAD era.
Joann G. Elmore, MD, MPH – CoPI of the PRISM Trial
Excellent article, arguably using AI to help write your articles (ask for academic essay style, not article style) would free up more of your time to research more interesting topics, I certainly use it for most of my ADHD clinic reports/emails etc and means I can incorporate more physiological ADHD phenotype stuff without burning out
Wow. Took me a period when I could read and think. We knew decades ago primitive ai was more accurate than a person. I chaired a credentials committee when a middle aged radiologist asked for mammography privileges. I asked for her record, she had been sued. I said she needed supervision for a number of cases. I was nit liked. Recent discussions if most current law suits, from an attorney, from failure to diagnose. For sure adding the current a I would lower risk, or at least recognize it as small as well as saving lives with lower cost
My mother died from breast cancer un her fifties mammography not done then . She,a physician, palpated a tiny lump. Had surgery with axillary dissection. She lived a decade in spite of then chemo and radiation. So, yes, I feel strongly that better early diagnosis is essential
Eric, that is a great article. You’re making the uncomfortable point that needs to be made: when the best RCT evidence shows +29% cancer detection, no recall/false-positive penalty (1.4% in both arms), and −44% reading workload (MASAI), “AI as optional add-on” starts to look like institutional inertia, not clinical judgment.
What I keep coming back to is your line of logic in Section 2: once AI flags risk/density/suspicion, the pathway often tightens surveillance (US/MRI/shorter intervals). That’s where the next bottleneck shows up: follow-up ultrasound quality is still wildly variable, especially in dense tissue and outside major centers.
So yes, make AI-mammo the default. But the “new standard” should be end-to-end: AI-assisted detection + quality-locked follow-up imaging, otherwise we’re upgrading the front door and leaving the hallway dark.
Curious where you land on this: should “AI-mammo standard” be paired with a quality standard for ultrasound in the follow-up pathway?
We just published with Mayo Clinic in Mayo Clinic Proceedings: Innovations, Quality & Outcomes (retrospective, 62,912 breast US scans / 688 patients), showing that adding image-enhanced + quality-improved ultrasound representations materially improves downstream AI classification performance
https://www.sciencedirect.com/science/article/pii/S254245482500102X?utm_source=chatgpt.com
This is a good report, thanks for putting this out. The data ARE very impressive.
Hi Eric. I must say, 20 years ago, we at Philips Research, used a combination of supervised machine learning and computer vision to serve as an adjunct for improving upon determining whether or not a solitary pulmonary nodule identified on Chest CT using Dr. Sol Katz's film radiograph criteria is either more likely to be benign or malignant. We were looking for a "better way" to leverage the soon to be released Lung Cancer Screening Trial data. Our application was reviewed favorably by the FDA, but was never commercialized due to political and business reasons. Two items of interest in your post. First, that the combination of machine learning (the core of AI) and computer vision can be of immense value for image-associated biomedical professions (radiology, pathology, ophthalmology, dermatology). Second, that the "obviousness" of such uses takes a very, very long time to validate and make part of daily medical practice. Eric
While AI in medical imaging holds tremendous promise, we should proceed with scientific rigor and historical humility. The enthusiasm is understandable, but we have been here before.
The MASAI trial is an important contribution, yet its screening context differs substantially from U.S. practice as noted by Dr Topol: In Sweden they have double reading, very low recall rates, and primarily 2D mammography. In contrast, U.S. screening typically involves single-reader interpretation, higher false-positive rates, and widespread use of 3D tomosynthesis. Generalizing across these fundamentally different systems requires caution.
We also must not forget the experience with computer-aided detection (CAD) in mammography. After FDA clearance in the late 1990s and rapid reimbursement approval, CAD support tools were widely adopted in the U.S. Early reader studies suggested improved accuracy with CAD, yet after dissemination into routine clinical practice, large-scale data showed no improvement. As shown in our 2007 NEJM paper, CAD use was associated with worse performance (J Fenton et al NEJM 2007). The human–computer interaction in real world clinical systems matters.
Most AI research focuses on algorithm development, but far less attention has been paid to how outputs are integrated into workflow, how clinicians respond to different formats and styles of algorithmic cues and how timing of being shown the AI report shapes decisions in real world settings. AI support tool reports are usually shown to radiologists before they view each exam, thus potentially altering the physicians’ viewing behavior and interpretation. Automation bias, the tendency to defer to algorithmic output, can alter physician judgment in unpredictable ways, particularly when computer prompts precede independent interpretation.
AI may transform screening, but if we fail to apply rigorous evaluation before widespread adoption, we risk repeating mistakes from the CAD era.
Joann G. Elmore, MD, MPH – CoPI of the PRISM Trial
https://www.nejm.org/doi/full/10.1056/NEJMoa066099
https://jamanetwork.com/journals/jama-health-forum/fullarticle/2789519
https://www.newswise.com/pdf_docs/176168920999683_ElmoreAI-CAD-Diagnosing.pdf
Excellent article, arguably using AI to help write your articles (ask for academic essay style, not article style) would free up more of your time to research more interesting topics, I certainly use it for most of my ADHD clinic reports/emails etc and means I can incorporate more physiological ADHD phenotype stuff without burning out
Wow. Took me a period when I could read and think. We knew decades ago primitive ai was more accurate than a person. I chaired a credentials committee when a middle aged radiologist asked for mammography privileges. I asked for her record, she had been sued. I said she needed supervision for a number of cases. I was nit liked. Recent discussions if most current law suits, from an attorney, from failure to diagnose. For sure adding the current a I would lower risk, or at least recognize it as small as well as saving lives with lower cost
My mother died from breast cancer un her fifties mammography not done then . She,a physician, palpated a tiny lump. Had surgery with axillary dissection. She lived a decade in spite of then chemo and radiation. So, yes, I feel strongly that better early diagnosis is essential
Typo fyi in next-to-last paragraph: "$160,00" — missing a zero.
thanks...will fix
Excellent work!
Thanks Eric!