Ty Aderhold (00:38): From Advisory Board, we're bringing you a Radio Advisory. Your weekly download on how to untangle healthcare's most pressing challenges. My name is Ty Aderhold. You probably know me as the guy who is always brought in to answer questions about AI. And as you can imagine, I'm pretty busy these days. AI is everywhere. It's the center of every conversation, every conference. Leaders are eager to embrace technologies that claim they can solve some of the most intractable problems in healthcare and vendors, for their part, are eager to provide that support. The fact is we're in an AI gold rush and I think rush is an appropriate way to describe the pace of change. So in today's conversation, I want us to slow down and pause. I want us to evaluate where these tools really are today. I want us to be clear eyed about the direct and indirect impact that AI can have on healthcare. And I can't think of a better person to do that than Rae's dad. (01:35): David Woods was one of the thought leaders at the beginning of the patient safety movement. For more than 40 years, he and his team have worked to improve how human-machine systems work in high risk and highly complex settings like healthcare. I think it's obvious why Rae is going to sit this one out, so I'm thrilled to take the mic and have a conversation with David Woods, Mike Rayo and Dane Morey, who lead the Cognitive Systems Engineering lab at the Ohio State University, focusing on the design and evaluation strategies that facilitate resilient human-machine performance. David, Mike, Dane, welcome to Radio Advisory. Mike Rayo (02:15): Thanks for having us. Dane Morey (02:16): Happy to be here. Ty Aderhold (02:18): There is so much hype around AI right now and I want to focus our listeners' attention on reality. How do you all want health leaders to think about these tools? Where are they built to succeed and where are they limited? Mike Rayo (02:31): I think the first thing to think about is it's not just one AI. Although just about all the people that are thinking about AI right now are thinking about large language models, there's actually a much broader set of technologies. In general, we know that AI can amplify our ability to do work, but it's also fallible. And so what we know is that if we're using that AI to support and amplify people to make them better at things, those are the things that it actually turns out to be. You can do good things with it. Notice I didn't say that it's good at it. Ty Aderhold (03:06): And Mike, maybe what I'm hearing, and I want to focus in on large language models here, I think what you're saying is the technology wasn't necessarily designed to be good. It was designed to look good. David Woods (03:18): This language capability of LLMs is like the perfect storm, because it can converse with you, it can answer questions. It seems to be relevant. It's like the perfect way to trick people into over ascribing intelligence. And these are well-documented phenomenon about how machines and algorithms can trick people into thinking they are very smart and you see this in the anthropomorphization of how we talk about the AI when there are many, many different aspects to the technology. Ty Aderhold (04:04): David, I think you're getting at an important misconception here around what these tools are good at, but you hinted at another misconception that I want to get into further. And this is a thinking around AI without people. And Dan, I'm actually going to throw this one to you. What is wrong with that way of thinking? Dane Morey (04:23): We found over the past 40 years, 50 years, 60 years that as new technologies are introduced, they change the nature of work, but they don't replace people at work. These technologies change what constitutes work, what makes work difficult, what it takes to do work. And over the years, we've consistently seen that aspect of technologies be underappreciated. The way that these technologies change work, but don't replace the need for people in work. David Woods (04:58): So this is what we call joint activity and that's been what we always come back to. When you look at real work like clinical practice in healthcare, many parties and roles are involved. There's always technology that helps us. Thinking about any machine or any software or algorithm as an island by itself, it's not really doing safety systems engineering in the complex world of healthcare. And as you play with the real variability and complexity of people, disease, comorbidities, it turns out that there's always surprises and so this is where we need multiple players to work together. We need the people with their knowledge and we need the technology that can help us process information, bring connections together, help us generate additional hypotheses. This is why we always talk about it as a joint system, Ty Aderhold (06:00): And I think it's important to name these misconceptions upfront, because I think they can lead organizations down the wrong path. They can lead to failures and we have plenty of examples of tech innovations over the years that have failed or fizzled out. I don't think that's the case here and I don't think you all are on here to tell listeners to ignore AI entirely. Instead, you've argued that successful use of AI has to start with a fairly simple evaluation. Is it safe and is it effective? What does that mean in practice? Mike Rayo (06:36): If the person who's using it or the people that are using that, if they're able to see when the machine is wrong, because it's going to be wrong, it can be safe and effective. It can contribute to safe and effective. If the person can't see when it's wrong, then it can't, because you're making things too complex. Can this contribute to conveying its own poor performance? And can it also convey when it's performing really well, even though it always thinks it's performing pretty well? David Woods (07:03): Everything that we might utilize has limits. It has a therapeutic range to use a kind of medical analogy. It can help us, but it also has a place where it's going to tend to misunderstand, misassess or misrecommend. So have to get away from thinking that the algorithm provides solutions and start thinking about how we work together to handle hard cases. Now, part of the attention and excitement for administrators of AI is how does this streamline administration and other kinds of non-clinical activities? Ty Aderhold (07:41): Right. We're looking towards productivity. David Woods (07:43): And that's what we see is driving the AI gold rush in almost every setting. The first thing they say is, "We're going to get productivity gains." And then there's a lag between expecting productivity gains and actually developing and deploying the technology in a realistic setting. Ty Aderhold (08:05): When I'm out talking to healthcare leaders, their focus on the kind of productivity that should, in theory, help their own people. Tools that help with the administrative burden that help reduce burnout, make doctors and care teams more productive and therefore, also make systems more revenue. And on paper, I would say this feels like a win for everyone. Mike, what would you say to that? Mike Rayo (08:32): I think we need to balance what we're hearing. We're now starting to get studies about how this is actually affecting people and how they work. The biggest study in software development, even though the perception was those software developers could go 20% faster, they actually went 20% slower. We know from an MIT study that after four months of use with LLMs versus just with a search engine or without any digital aid, you actually get decreased brain activity and less critical functioning. So I'll go back to my original point, which is this is about supporting people. So if you want to reduce burnout, you don't take away those tasks. You make people more capable with the machine assist to be able to do those things, because if you're replacing them, if anything goes awry, those burnout numbers, inability to cope, that just goes through the roof. Ty Aderhold (09:22): I want to give our listeners a specific healthcare example here of the impact that using large language models in healthcare can have and how tech that doesn't necessarily convey when it's wrong can be ineffective at best or risky at worse. You all just published a study in Nature's Digital Medicine journal that looked at how more than 450 nurses used AI in clinical decision-making. Dane, what was the goal of this study? Dane Morey (09:54): This study actually built on a long line of research that actually went back into research on improving alarm displays. This was a line of research from Mike that we took an extended to AI-infused technologies. Ty Aderhold (10:09): Okay. Mike Rayo (10:10): So the typical problem is when the machine has a bad day, typically, the overall system has a bad day. We didn't think that that was okay for healthcare or for anything else. When the machine has a bad day, we still want the system to have a good day. Dane Morey (10:24): We basically built an AI advisor for detecting patient decompensation at the bedside. Our vision was that maybe this was a technology that could be used at a central nurse monitoring station, And we build a predictive algorithm to predict the likelihood that a patient would experience an emergency event which would require a rapid response team mobilization in the next five minutes. And we tested out a few different ways to instrument this algorithm with people in a healthcare system through display interface. One is the typical expert AI approach where we just slapped a very obvious recommendation on the screen. Actually was a giant red bar, almost impossible to ignore and this was the AI's advice or recommendation, the likelihood that this patient was going to need an emergency event. Oftentimes what we hear from clinicians, that they want more than just the AI advice. Maybe we need an explanation as well. And so we built a series of visual annotations on top of a rich data display to help illuminate the parts of a patient's data, the vital signs that were contributing most strongly to the algorithm's prediction. Ty Aderhold (11:43): Got it. And because AI is highly efficient, the hope here would be that the addition of these AI recommendations and explanations would make nurses perform better and faster in these scenarios. Is that what happened? Dane Morey (11:57): Well, in short, no. So we actually saw a mix of helpful and harmful impacts, both with those AI recommendations and the explanations themselves. So when those AI recommendations were correct, they were highly helpful, they improved performance here and when they were incorrect, they were actually strongly harmful to how our nurses were interpreting these patients. And surprisingly, we've actually found the addition of explanations did very little to change how those AI recommendations were impacting nurses. Ty Aderhold (12:34): And how did the nurses, the people behave when the AI recommendations were off? Dane Morey (12:40): So very interestingly, we actually found that our nurses in this study behaved very similarly regardless of whether or not the AI recommendations or explanations were available to them. Nurses in general in our study seemed to very thoroughly review all the data available to them. They spent approximately the same time analyzing each patient and from everything we could tell, seemed to exert the same kinds of effort to come to a conclusion about the patient. And actually, some nurses told us deliberately that they tried to block out or ignore the AI algorithm. And yet we still found the presence of that AI was strongly influential to nurses on the whole. In general, our findings suggest that the influence of these AI recommendations cannot be fully explained by things like overreliance on the technology, complacency, or overtrust in the AI algorithm. Instead, the results suggest there's something more fundamental going on about how these technologies influence clinicians and how clinicians think about patients. Ty Aderhold (13:49): So Dane, what I'm hearing is that both this was not necessarily helping nurses become more efficient. They were still following much of the same processes and going through much of the same information that they would otherwise, but it was changing the way that those nurses approach the fundamental problem here. Dane, can you give us an example about the changing way that humans approach problems because they're using AI? Dane Morey (14:16): One of the most interesting patient cases we examined in our study was actually a patient on dialysis. So they had a very low baseline blood pressure. In addition, they had a recent blood pressure that was even lower, potentially dangerously low and our algorithm only assigned this patient a 15% chance of needing an emergency response team. Interestingly, when nurses worked on this patient case or when nurses viewed this patient case without the AI algorithm involved, this seemed to be a fairly routine case for them. They pretty quickly recognized the dangerously low blood pressure and responded very urgently and with a high degree of concern. However, when we presented them with the very erroneous and low AI prediction of 15%, we actually saw very, very different responses from nurses. Nurses responded with much less concern. Even when they noticed the low blood pressure, they were much more likely to talk about that blood pressure as normal for the patient rather than dangerously low. Ty Aderhold (15:22): Interesting. David Woods (15:24): It's a very strong case that illustrates the need for joint design and joint testing that people and machines need to work together and we need to test how they work together in difficult cases, as well as easier cases. Ty Aderhold (15:41): I'm sure there's listeners sitting there thinking, okay, this example that Dane and David just gave is a very clinical example, but I'm okay, because our organization is looking much more to administrative support. We're going to use AI to help clinicians with dictation. Mike Rayo (16:01): Anything that's administrative, but in clinical hands, is clinical work. So thinking about ambient listening and putting things into the chart, there's no way effectively to know that things that are wrong are getting into the chart. One, that's going to affect that individual patient care, but that particular example is actually even more important, because we're also going to be using a set of data analytics tools and go through the charts to figure out what's going on for multiple patients. David Woods (16:28): There really isn't a low risk situation here and the ultimate reason for that is because LLMs make mistakes. Lots of mistakes. So let me give you an example that actually happened in aviation. There's these notices pilots get before a flight. It's time-consuming. They were on paper and so they were computerized with some assists that would sort these things for the pilots who review and find what was important. But what happened is management cut the time available for pilots to review this important information. So instead of actually helping the pilots customize their flight plan, they had less time to evaluate the information. For the management, it was efficient, because less pilot time was required, less money is being spent. From a pilot point of view, it made a hard task that they wanted help for harder in a way that made it easier for them to miss an important bit of information. (17:37): And I think this is really a big issue in deploying new technology like LLMs, is we think we can get an assist on easy tasks and we miss the idea that those easy tasks turn out to be connected in this web of ways that we do activity in a complex system. But instead of being problem-driven and let's make the record more informative, easier to find what's critical information, we ended up just making it floppy in a different way. And this is really at the heart of what we're trying to get across to people. This can make a big difference, but only if you start with what's effective and what's safe. If you start with I'm going to save money, you're probably going to fix some things and create new problems and new difficulties and oftentimes, new risks. We started the patient safety movement just about 30 years ago and we started with among the key things was being patient-centered all the time and bringing systems approaches and systems thinking. And so just making AI do the job and have AI provide solutions and benefits is insufficient to meet those criteria. Ty Aderhold (20:24): For our listeners, how should they go out to vendors and assess if a product is going to be safe and effective? What questions should they be asking of vendors? Mike Rayo (20:35): I ask six questions. How does it know it's wrong and how well does it know it's wrong? How does it convey that it's wrong and how well does it convey that it's wrong? And does it convey that it's wrong when it doesn't know that it's wrong? And then how well does it convey that it's wrong when it doesn't know that it's wrong? Anytime that vendor, that solution provider comes and brings this newfangled gizmo, they have to answer those six questions. If they're not doing that, then they're not ready for high stakes situations. Ty Aderhold (21:06): I love these questions. Many of the organizations that we work with, many of the listeners I'm sure are already pretty far along in adopting AI. And they're moving forward with implementation, knowing that realistically, we're not going to stop this gold rush. How should leaders balance the tension of adopting new technologies while also taking lessons from research like yours that is still being done today? David Woods (21:36): The first thing is the management needs to look honestly and clearly at what's actually happening on the ground as they deploy and integrate AI-infused technologies. It will not work out as they expect completely. Sometimes they may find surprising ways people have adapted AI to be effective in their context that they didn't understand and know and sometimes they're going to see problems arising that people have to work around. We have to generate clear information about risks. In this case, you may also discover opportunities to exploit. Mike Rayo (22:16): What we are advising healthcare organizations, what we're advising military organizations, what we're advising transportation organizations is that there is a method to evaluate these technologies at every stage of the life cycle. So if it hasn't been implemented yet, there are ways to evaluate these to answer these questions. If it has been implemented, there are still ways to do this. If you're far along in implementation, there are still ways to evaluate this, because that's actually one of the features that the AI solution providers tout, is that this is always learning, it's always changing. That we need to have these evaluation methods for the entire life cycle. Ty Aderhold (22:55): One conclusion I'm taking is it's not too late. Even if you've already moved on some of these technologies, it's not too late to bring in some of these questions. Mike Rayo (23:05): Never too late. It's never too late. In fact, it can't be too late, because you have to keep doing it and you have to keep checking in on it. And so finding a way to do that in a way that fits within your operational rhythms and financial constraints is critical. When you bring this into your organization, you cannot stop. Ty Aderhold (23:26): When it comes to enabling safe and effective use of AI, what do you want our listeners to do next? Dane Morey (23:33): Before deploying an AI infused technology, the minimum that we need to do is understand the full picture of what's likely to be the impacts of that technology as it's deployed. Our study has three key requirements for deploying technologies. First, we need to be evaluating people and AI together. It's not enough to just evaluate AI algorithms in isolation. Second, evaluate a range of challenging situations which at minimum produce a range of strong, mediocre and some instances of poor AI performance. If we don't examine those instances of poor AI performance, no matter how rare those instances might be, we are failing to understand the full scope of impacts that these technologies will have in practice. And third, we should not be aggregating or combining these instances of strong and poor AI performance. Correct and incorrect AI recommendations are drastically different classes of challenges. If we combine them, we may have frequent, small benefits of correct AI recommendations, but rare, severe harms of erroneous AI recommendations. Mike Rayo (24:52): Just want to reiterate that there's a way to do this and have a better idea of how it's really functioning out there and predict how it's going to function. We desperately want your listeners to find out some of these shortcomings and vulnerabilities before those things propagate all the way to the patient. The systems are going to have bad days. We know the technology are going to have bad days and if we can hold that in the clinical space and not have it reach the patient, that's what we want and that's going to require all of these different evaluation methods. David Woods (25:22): What I'd like to emphasize is the allure of shortcuts that LLMs and the AI gold rush offer. We can get productivity gains, efficiency gains, whole bunch of different things that seem so attractive and we can get them quickly and easily. And that is a dangerous siren song, because it leads you to actually miss real, realistic things we see over and over again that turn into problems down the road. Yes, they turn into adverse events. Yes, they turn into gains not realized. I thought I was getting productivity gains and in fact, I ended up shifting the work to corroborating and cross-checking the AI algorithms output so much that I actually slowed down productivity. We computerized in healthcare and that was beneficial in many fronts. But also notice as healthcare leaders, you now had new cybersecurity issues, new kinds of vulnerabilities, ransomware. The same thing applies to inserting AI into your system. It brings a tale of interdependencies and other kinds of costs that can offset the benefit. Ty Aderhold (26:46): David, Mike, Dane, thanks so much for being on Radio Advisory. I've learned a ton and I hope the listeners have as well. David Woods (26:53): It's been a pleasure to discuss things with you, Ty, on this critical issue. Mike Rayo (26:58): Yeah. Thanks for having us. Love to talk about it. Dane Morey (27:00): Thanks so much. Ty Aderhold (27:05): From everything I heard from the team at OSU, it's clear to me that this is a make or break moment, both from a needing to solve challenges urgently and accepting the risks that comes with the choices you make to solve those challenges. What you do next will set your organization up for success or failure. And remember, as always, we're here to help. Rae Woods (27:32): Here's what our Advisory Board research team is watching this week. I'm recording this on Friday, November 14th. Two days after the federal government officially ended the longest shutdown in US history. Over the 43 days of the shutdown, healthcare and specifically ACA subsidies were at the center of the debate, but that's not the only healthcare program affected by the prolonged shutdown or frankly, by the deal that was struck to reopen the government. I want to be honest with you. We still don't have certainty on some of the biggest open questions in healthcare policy, so in the absence of certainty, I at least want to be able to provide you a bit of clarity. Here's what we know. The ACA subsidies are still set to expire at the end of 2025. There may be a vote in December to extend the subsidies, but there's no guarantee of that vote or frankly, its outcomes. (28:26): For context, insurance enrollment through the ACA marketplace has skyrocketed since 2021 after the American Rescue Plan expanded federal subsidies. With that influx, came a healthier risk pool. Now, should the subsidies go away, the ACA exchange would likely go back to being a higher risk pool. A place where people seek coverage and are willing to swallow a higher premium, because they might not have another choice. This means ACA plans will be more expensive to individuals and less profitable for insurers. Health plans are already locked into their rates for 2026 and they may pull back on the public exchanges in 2027 and beyond. Let me put this into numbers. Without these subsidies, premiums for individuals who get insurance through the exchange would more than double, increasing 114%. The CBO estimates that 4.2 million people currently getting insurance through the subsidized exchange would become uninsured. The resulting combination of unaffordable care and uncompensated care means higher likelihood for consumers to delay care. (29:34): And in many ways, this will feel like a cut to the industry. Providers won't get reimbursed for care that is delayed or foregone due to cost pressure and the patients that do eventually seek care are more likely to be more complex, more acute and more expensive. This will put pressure on providers with revenue at risk and costs for purchasers will increase. Not to mention the human impact. People will be sicker. But remember, there were other healthcare programs that weren't at the center of the debate, but were caught in the crosshairs. I'm talking about waiver programs, including hospital at home and telehealth. Both waivers expired on September 30th, meaning technically, services covered by these programs were no longer eligible to receive Medicare reimbursement during the entirety of the shutdown, which began on October 1st. Now, these waivers have had bipartisan support in the past and that led to multiple extensions and the continuing resolution that reopened the government did in fact extend both telehealth and hospital at home waivers, but only until January 30th of 2026. (30:40): For the physicians that have continued to offer telehealth services to seniors with the hope that they'll be reimbursed after the shutdown ends, they can remain cautiously optimistic that their claims will get reprocessed and approved. Hospital at home programs may have a bit more of an uphill battle to get back to peak operations. CMS guidance on November 6th told these programs that they needed to discharge or return all patients to the four walls of the hospital. Here's where I'll leave it. Healthcare reform is back on the radar in Washington and here at Radio Advisory, we'd love to know what practical questions that brings up for your business, because like we always say, we're here to help. (31:42): New episodes drop every Tuesday. If you like Radio Advisory, please share it with your networks. Subscribe wherever you get your podcasts and leave a rating and a review. Radio Advisory is a production of Advisory Board. This episode was produced by me, Rae Woods, as well as 14-time guest and now first-time host, Ty Aderhold, as well as Abby Burns, Chloe Bakst and Atticus Raasch. The episode was edited by Katy Anderson with technical support provided by Dan Tayag, Chris Phelps and Joe Shrum. Special thanks to Anne Woods who literally delayed a visit with her grandchildren so that my dad could record this conversation. Additional support was provided by Leanne Elston and Erin Collins. We'll see you next week.