Skip to main content

Advanced Search

Advanced Search

Current Filters

Filter your query

Publication Types

Other

to

Toplines
  • Employees of different size health systems are working to make AI useful in solving problems and improving care, while also mitigating the risk of harm to patients and institutional reputations

  • By sparking new levels of vigilance and collaboration across disciplines, the process of adopting or adapting to AI in health care may be part of its value

Authors

One of the central challenges of introducing artificial intelligence (AI) to any industry is that its promise and peril are so entwined. The tools that automate routine tasks, such as scheduling appointments or writing code, can reduce drudgery and boost productivity but also eliminate jobs. Algorithms that interpret images and enhance diagnostic accuracy can erode clinicians’ skills. Meanwhile, the data centers powering breakthroughs in disease treatment deplete fresh water and drive up electricity use, causing environmental harm particularly to the communities nearest them.

This dichotomy is mirrored in divisions between people who believe AI can make any industry faster, cheaper, and more efficient, and those who focus on AI’s potential to discriminate and amplify harm. In sectors where jobs can be easily automated, the rosier view of AI tends to prevail. But in health care — where the consequences of AI’s missteps are high, and human expertise can’t be readily replaced — organizations are compelled to strike a balance between innovation and caution.

This issue of Transforming Care looks at how employees of health care systems of different sizes are working to make AI useful in solving business problems and improving care, while also mitigating the risk of harm to patients and institutional reputations. We explore how they distinguish promising ideas from wishful thinking and prioritize use cases when the list of health care’s intractable problems seems so long. They also share how they tackle common impediments to AI’s advancement — by earning the trust of clinicians and by ensuring that what works in one setting will work in a new environment with different patients, different staff, and different financial incentives.

“The variation in environments is part of the reason we haven’t seen an AI-enabled change in a treatment paradigm that dramatically improves outcomes across the U.S.,” says Mark Sendak, MD, MPP, the former population health and data science lead at the Duke Institute for Health Innovation, which supports clinicians in leveraging machine learning and data science to solve clinical problems. “AI hasn’t had its penicillin moment because the context of the implementation really shapes the outcome.”

Indeed, as the examples that follow show, AI’s current value may rest not only in achieving a particular outcome but also in the process of adopting or adapting to it. That’s because AI’s usefulness hinges on testing and refinement in real-world settings. And in high-stakes environments like hospitals and health clinics, that requires new levels of vigilance as well as collaboration across disciplines to avoid harm.

Challenge No. 1: Deciding Where to Invest

To be clear, artificial intelligence is not new to health care. Over the past two decades, health systems have used predictive AI models to sift through mountains of data from electronic health records (EHRs), radiology images, and billing records to generate new insights about diseases and the efficiency of health system operations. These tools, which rely on large datasets to discern patterns that can be used to infer future outcomes, have enhanced clinical decision-making by, for example, predicting an individual patient’s course of disease, their odds of developing surgical complications, or response to immunotherapy. They also have helped increase efficiency through forecasting the need for staff, beds, and supplies based on disease outbreaks and seasonal variation. And they have closed gaps in care that contribute to poor health outcomes, such as by identifying and prioritizing patients who need follow-up care.

As the mathematical models that drive predictive forms of AI have become more sophisticated, and the data on which they rely more robust and less expensive to manipulate, such models have begun to outperform humans. In small-scale deployments, health systems have shown these tools can detect signs of pancreatic cancer and sepsis earlier than humans do, though challenges to realizing their potential abound.

Less dramatic, but of enormous value to clinician productivity and morale, are the newer generative AI tools that produce text and images in response to prompts. When health systems roll these out in the form of ambient scribes that capture and summarize critical details of medical visits, or as tools that take a first pass at filling out prior authorization requests, they can lessen after-hours work known as “pajama time,” a key contributor to burnout. “I’m hearing people say things like, ‘I’ve got my life back.’ When was the last time you heard someone say that about a piece of technology?” says Armando Bedoya, MD, MMCi, the chief analytics and medical informatics officer at Duke University Health System.

A Survey of 43 Large, Nonprofit Health Systems Found the Majority Were Developing or Piloting Tools for Engaging Patients and Documenting Their Care

The proliferation of potential use cases and vendors pitching off-the-shelf products requires health systems to make choices about where to invest scarce resources for evaluation and monitoring. Without a deliberative process that weighs the benefit to patients and doctors against AI’s impact on the bottom line, the latter tends to win out, says Jono Hoogerbrug, MD, a primary care physician in Auckland, New Zealand, and a former Commonwealth Fund Harkness Fellow in Health Care Policy and Practice who spent a year at Stanford University studying organizational behavior surrounding AI adoption. “A lot of the decision-making sits at the C-suite level, where the value proposition is often based on return-on-investment,” he says. “That’s important to financial sustainability, but it can lead to a misalignment between an institution’s financial interests and the needs of patients or clinicians.”

At large health systems like Duke and the Mayo Clinic, the decision to adopt a particular AI model has been delegated to local hospitals, departments, and specialties, while aspects of vetting and evaluating use cases are centralized at the enterprise level. Like Vanderbilt Health, these health systems have established structured processes and multidisciplinary teams to guide staff in assessing whether a particular model works as intended. The Duke Institute for Health Innovation and Vanderbilt’s ADVANCE (AI Discovery and Vigilance to Accelerate Innovation and Clinical Excellence) Center bring faculty with expertise in computer science, data informatics, ethics, and anthropology together with health system administrators and clinicians to identify which models suggested by staff or outside vendors may require additional oversight to mitigate clinical, operational, or regulatory risks. The Mayo Clinic embeds aspects of the validation process in the Mayo Clinic Platform, a division that partners with AI start-ups and more established companies to test and refine models using de-identified patient data, with a goal of scaling promising solutions worldwide.

In guiding staff, all three institutions ask similar questions: How big is the problem the tool seeks to solve? Is there a simpler, cheaper solution already available on the market? If not, what’s the source and quality of data on which the model will rely? And what measures exist, or would need to be created, to test its effectiveness and monitor its impact over time?

Since it was launched in 2020, the Mayo Clinic Platform has focused largely on exploring innovations it hopes will improve outcomes for large numbers of patients, increase efficiency, or improve workforce retention. Vendors that aren’t “values aligned” — such as those developing algorithms that could be used to delay or deny care — are filtered out. Meanwhile, solutions that address pain points for clinicians by, for example, distilling critical details from thousands of pages of medical records for second-opinion consultations, often advance. Some projects, like one that produced 3D avatars of Mayo Clinic clinicians, target all three priorities at once. The avatars may soon answer basic patient questions, such as “How long do I need to wear this cast?” after hours. Trained on prior video visits, the avatars mirror an individual’s speech patterns and behaviors so closely that they are nearly indistinguishable from their human counterparts, says John Halamka, MD, MS, the Dwight and Dian Diercks President of the Mayo Clinic Platform. “The only thing I’ve noticed is the avatars gesture too much,” he says.

Challenge No. 2: Creating a More Accurate Representation of Patients

Unlike digital apps and social media platforms that enable real-time tracking of people’s movements, reading tastes, and purchasing habits, the data captured in EHRs present an imperfect view of patients and their needs. Even when they span decades, EHR data are based on a limited set of observations, recorded in idiosyncratic ways that often reflect individual and societal biases.

Vanderbilt researchers, for instance, found providers in hospitals in Chicago and Nashville spent more time reviewing medical records and updating clinical notes for white patients and people who pay cash than others, a practice that undermines the reliability of AI tools that depend on documentation, such as early-warning systems that scan medical records and flag signs of clinical deterioration in hospitalized patients. Data on patients who face economic or logistical barriers in accessing care are also more limited or missing altogether.

If these gaps aren’t accounted for, AI models can exacerbate disparities by drawing more attention and resources to patients who already have access to medical care and neglecting sicker people who don’t. Generalizing findings from AI models that were trained on data on patients in just a handful of states can also be problematic if the social, behavioral, and ethnic characteristics of patients in one region differ substantially from another. To ensure AI tools learn from local data, PCCI, a nonprofit research firm that was spun out of Parkland Health and Hospital System in Dallas, built a custom dataset that it’s using to identify unique risk factors that low-income patients face and the ways neighborhood-level conditions affect their health outcomes and access to care. PCCI’s cloud-based platform combines de-identified clinical data from Parkland and more than 100 other hospitals and health systems in the Dallas area. Using geocoding, the records are linked to publicly reported information on local crime rates, air quality, and accessibility of transportation, grocery stores, and green spaces, among some 26 nonmedical drivers of health.

Applying AI models to the enhanced dataset enabled PCCI to document a correlation between car ownership and access to prenatal care, suggesting telemedicine and ride-sharing programs could be leveraged to reduce preterm births. The nonprofit also found children living in communities with higher smoking rates are more likely to experience asthma exacerbations, as were children living in areas with high levels of food insecurity, suggesting families may prioritize food needs over medical care to stretch dwindling resources. “Interventions addressing food insecurity such as community farms or adequate SNAP benefits might free up family resources to support the care of vulnerable children with asthma,” says Yolande Pengetnze, MD, MS, a pediatrician and vice president of clinical leadership at PCCI.

AI allows us to capture small differences in risk factors that when added together become a substantial risk factor for poor outcomes. You can only visualize that using a model that can measure small increments.

Yolande Pengetnze, MD, MS Vice President of Clinical Leadership, PCCI

PCCI is using the data to map census blocks where the disease burden of diabetes, hypertension, and asthma is higher, with a goal of highlighting interventions that could improve outcomes for a single patient or a community. “In one area, patients might lack access to a pharmacy. In another, there may be a shortage of smoking cessation programs,” Pengetnze says.

PCCI’s Community Vulnerability Compass enables health systems, government agencies, and community-based organizations to target resources to communities and patients at higher risk for poor health outcomes.

PCCI’s Community Vulnerability Compass screenshot

Mayo Clinic also leverages data from public records to strengthen the performance and equity of its AI models. One example is the Housing-Based Socioeconomic Status (HOUSES) Index, which uses standardized details in property records — such as the number of bedrooms and bathrooms in a patient’s residence, its square footage, and the estimated property value — as a proxy for household wealth. Unlike traditional survey methods, which are resource-intensive and depend on asking patients questions intermittently, the HOUSES Index can be universally applied and seamlessly linked to patient addresses. Doing so revealed that an AI model developed to improve asthma management achieved strong results for high-income patients but performed less effectively for lower-income populations. Similarly, the index enabled Mayo Clinic researchers to identify disparities in access to kidney transplants.

Without socioeconomic and community-level conditions factored into AI models, algorithms for predicting a patient’s risk of contracting HIV worked well for white males but less so for females or Black males, found Julia Marcus, PhD, MPH, associate professor at the Harvard Medical School and the Harvard Pilgrim Health Care Institute. Once new variables such as poverty, lack of insurance, and HIV prevalence in the patient’s neighborhood were incorporated, the algorithms became more accurate for both groups.

More complicated was convincing clinicians to trust the algorithm over themselves, particularly those accustomed to using clinical or behavioral characteristics to assess risk (such as whether patients were men who had sex with men or had been diagnosed with a sexually transmitted disease). Marcus and colleagues piloted a decision support tool that alerted primary care physicians that a small subset of their patients (2%) faced 100 times higher odds of being diagnosed with HIV than the general population, but some clinicians assumed the tool was misfiring when the patient’s circumstances didn’t match their expectations. To persuade them of the algorithm’s validity, they presented data showing the number of patients who would be missed using traditional criteria and added illustrations of how the new domains helped.

Challenge No. 3: Encouraging Staff to Use, But Not Misuse, AI Tools

Clinicians may be quicker to adopt tools that reduce friction, restore time, or improve their outcomes they care about, but their trust erodes when tools add complexity or duplicate efforts, says Angel Arnaout, MD, MSc, MBA. Arnaout is a surgical oncologist and chief medical informatics officer at the Provincial Health Services Authority in British Columbia, Canada, who spent her yearlong Harkness Fellowship at Kaiser Permanente’s California headquarters studying how to meaningfully integrate digital technologies into clinician workflow to maximize value for cancer patients.

The lack of transparency into how an AI model works can also lead people to project their desires on the technology, imbuing it with skills it doesn’t have, Mark Sendak says. For example, when Duke implemented a model to identify hospitalized patients who could benefit from a palliative care consult because they were less likely than other patients to survive to discharge, clinicians began asking if they could use the tool to triage patients for intensive care unit beds. “That may sound like a benign secondary use, but it is totally unsafe,” Sendak explains, since the new use requires new training data capturing different time spans, patient characteristics, clinical interventions, and outcomes.

Duke developed a “Model Facts” card that resembles a nutrition or mattress label to reinforce the idea that AI models should be used only for the specific purpose they were trained. The template spells out how the model’s evaluation was conducted — with which population of patients and where, for example — and outlines the risks associated with misusing it. Duke also limits access to some AI tools by testing them in silent or shadow trials. Such trials allow health systems to run AI tools in the background, measuring their performance and functionality using real-time data, without allowing the tools to influence or alter clinical decision-making. In other instances, they assign the decision to act to an intermediary.

For example, the results of the sepsis detection algorithm are relayed to a dedicated team of on-site nurses, who determine which cases require attention. In their efforts to educate clinicians about the new tool and build trust in it, they avoided using the term artificial intelligence and emphasized the outcomes of individual cases to frontline providers, while reporting overall trends to administrators.

Vanderbilt also has a dedicated team trained to recognize and respond to emerging problems with AI models once they are deployed. To guide their work, the health system launched the Vanderbilt Algorithmovigilance Monitoring and Operations System, or VAMOS for short. The dashboard functions like an air traffic control system — monitoring the more than 300 AI models the health system has approved for use. It flags unexpected outcomes and other performance problems requiring human intervention, such as variation in a model’s accuracy by race. “Ongoing assessment is very important because a lot of these tools drift over time and have differential impacts on subpopulations. They can even have different performance depending on the floor of the hospital,” says Peter Embí, MD, MS, codirector of the ADVANCE Center at Vanderbilt Health.

VAMOS has an accordion feature that allows users to expand and expose more detailed information about an AI tool, including the reason for the alert and changes to key metrics over time. This mock-up shows how the tool alerts a dedicated team to problems with performance, process, outcomes, and fairness. Data: Vanderbilt Health.

Vanderbilt Algorithmovigilance Monitoring and Operations System (VAMOS) screenshot
Tracking the Complexities of Deploying AI and Educating Staff About Them

Challenge No. 4: Addressing Patients’ Reservations About AI

While many clinicians and patients are enthusiastic about AI’s potential to make care more accessible, personalized, or cost-effective, some worry about how the data that drive AI models are managed and put into use. Findings from a survey of U.S. adults in JAMA Network Open found consumers are deeply skeptical that health care systems will use artificial intelligence responsibly and are more comfortable with clinical applications of AI than administrative ones, like documentation, billing, and scheduling. Patients may view the latter as more susceptible to manipulation that could, in turn, limit their access to care or increase costs.

Paige Nong, PhD, one of the study’s authors and an assistant professor of health policy and management at the University of Minnesota School of Public Health, got a glimpse of how AI might make care more expensive: her doctor’s office had encouraged her not to reference “problems” during her annual visit, because the ambient scribe documenting the visit would capture them and recode the visit, generating a copayment for her. If the use of ambient scribes leads to more intensive billing, it may erode patient trust, she noted in a JAMA Health Forum commentary.

Preserving the privacy of medical records is also a concern as health systems and AI vendors share data. Images from medical records are finding their way into publicly available AI training data sets, and it’s not always clear how they got there. Some data may be picked up by web crawlers or trackers that scrape information about patients’ medical conditions, prescriptions, and appointments as they interact with health systems’ online portals. An investigation by The Markup and STAT found a third of the hospitals on Newsweek’s 100 top hospitals list had embedded Meta’s Pixel tracker on their websites, prompting class action lawsuits, including one Duke recently settled.

Many large health systems have also entered research agreements with Google, Microsoft, and other big tech firms that enable them to gain access to their training models and computing power in exchange for access to de-identified data. In other instances, health systems are selling de-identified data directly to brokers, who market it to pharmaceutical companies and AI developers.

The Mayo Clinic requires that training data remain within a cloud-based container under its control. But even with that safeguard, there is a theoretical risk that details in unstructured clinical notes would reveal a patient’s identity. Mentions of a unique occupation, a well-publicized event (such as an auto accident in which the victims’ names were reported in the news), or rare physical characteristics (such as two different eye colors) can serve as identifiers if left unchecked.

To address this, the Mayo Clinic partnered with Brad Malin, PhD, a leading expert in biomedical informatics and digital privacy at Vanderbilt, to establish standards for de-identification. Together, they determined which details beyond names should be removed or modified to prevent re-identification. They set a threshold that any shared characteristic must apply to at least 10 individuals to minimize the possibility that any one patient could be singled out.

The most common methods of de-identification involve trade-offs between patient privacy and the predictive value of AI models.

  • Suppressing identifying details like eye color so they aren’t visible.
  • Randomizing details, such as by adding a month to all birth dates so they no longer match other datasets.
  • Generalizing data to make data more abstract — for example, using three or four digits of a zip code instead of all five.

Malin, who’s written about the challenges of protecting reproductive health data stored in electronic health records, says if patients don’t trust health care systems to protect their data or feel their personal information is being used for profit, they won’t seek care. “That’s my greatest fear. If people lose trust, they’re just going to say, ‘What’s the point of health care?” To guard against this, Vanderbilt created an AI Patient and Family Advisory Group of 10 patients and family members (soon to be 15). The group meets monthly to review what the health system is doing. “We often turn to patients and families to help define the metrics of success,” says Susannah Rose, PhD, an ethicist who manages the group in concert with Laurie Novak, PhD, MHSA, an anthropologist. For questions that require broader input, Vanderbilt surveys a group of 1,000 patients who have committed to answering ad hoc queries about AI use.

The Path Forward

Engaging patients and frontline staff more actively and doing so in the earliest stages of AI development may speed adoption, says Brigitte Woo, PhD, a registered nurse and former Harkness Fellow from Singapore who spent the past year at the University of Pennsylvania Health System studying how artificial intelligence could reduce the documentation burden nurses face. “I found when nurses are engaged in the design and deployment process, the tools tend to fit better into workflows,” she says. “When they feel excluded, AI can feel like just another layer of oversight or surveillance.”

In resource-constrained environments, it can be hard to create a space for safe experimentation with AI tools, but Woo thinks that, too, would help. “No matter how sophisticated the AI system is, it only works if people understand it, believe in it, and feel confident in applying it in real-world care,” she says. That includes having the space to ask questions and challenge the validity of models.

Such feedback loops are critically important because AI is more dynamic than traditional software. It may work well in testing only to decay or drift in new settings. “To guard against that, we have to keep asking ourselves the question, ‘Is this model performing fairly for everyone?’” says Ed Middleton, MBBS, a former Harkness Fellow from the United Kingdom who spent a year at Stanford studying how AI regulation could ameliorate or exacerbate health inequities. “You run into problems when you assume the models are fine anywhere and you’re not looking for bias,” he says.

Vivian Lee, MD, PhD, MBA, author of The Long Fix: Solving America’s Health Care Crisis with Strategies That Work for Everyone and a Commonwealth Fund board member, says shining a light on deficits of trust, workforce engagement, and equity may be one of AI’s hidden superpowers. “AI offers the opportunity for us to think more imaginatively about how to communicate and engage people — our patients and our staff — to achieve better health,” she says.

EDITORIAL ADVISORY BOARD

Special thanks to Editorial Advisory Board member David Blumenthal for his help with this issue.

Jean Accius, PhD, CEO, Creating Healthier Communities

Anne-Marie J. Audet, MD, MSc, former senior medical officer, The Quality Institute, United Hospital Fund

David Blumenthal, MD, MPP, professor of the practice of public health and health policy, Harvard T.H. Chan School of Public Health

Marshall Chin, MD, MPH, professor of healthcare ethics, University of Chicago

Nathaniel Counts, JD, chief policy officer, Kennedy Forum

Timothy Ferris, MD, MPH, vice president, InterSystems

Don Goldmann, MD, chief scientific officer emeritus and senior fellow, Institute for Healthcare Improvement

Laura Gottlieb, MD, MPH, professor of family and community medicine, University of California, San Francisco, School of Medicine

Carole Roan Gresenz, PhD, dean, McCourt School of Public Policy, Georgetown University

Allison Hamblin, MSPH, president and chief executive officer, Center for Health Care Strategies

Thomas Hartman, vice president, IPRO

Sinsi Hernández-Cancio, JD, vice president, National Partnership for Women & Families

Clemens Hong, MD, MPH, director of community programs, Los Angeles County Department of Health Services

Kathleen Nolan, MPH, regional vice president, Health Management Associates

Harold Pincus, MD, professor of psychiatry, Columbia University

Chris Queram, MA, principal, CQ Health Strategies

Sara Rosenbaum, JD, professor emerita of health law and policy, George Washington University

Michael Rothman, DrPH, executive director of process excellence, Stanford University School of Medicine

Mark A. Zezza, PhD, founder, Venditti Consulting

Publication Details

Date

Contact

Sarah Klein, Consulting Writer and Editor

sklein@cmwf.org

Citation

Sarah Klein, “Sharing a Workplace: How Health Systems Make Artificial Intelligence Useful to Clinicians,” feature article, Commonwealth Fund, April 16, 2026. https://doi.org/10.26099/w83h-9t80