Ground Truths
Ground Truths
Geoffrey Hinton: Large Language Models in Medicine. They Understand and Have Empathy

Geoffrey Hinton: Large Language Models in Medicine. They Understand and Have Empathy

Perceptions from the Godfather of A.I.

No transcript...

This is one of the most enthralling and fun interviews I’ve ever done (in 2 decades of doing them) and I hope that you’ll find it stimulating and provocative. If you did, please share with your network.


And thanks for listening, reading, and subscribing to Ground Truths.

Recorded 4 December 2023

Transcript below with external links to relevant material along with links to the audio

ERIC TOPOL (00:00):

This is for me a real delight to have the chance to have a conversation with Geoffrey Hinton. I followed his work for years, but this is the first time we've actually had a chance to meet. And so this is for me, one of the real highlights of our Ground Truths podcast. So welcome Geoff.


Thank you very much. It's a real opportunity for me too. You're an expert in one area. I'm an expert in another and it's great to meet up.

ERIC TOPOL (00:29):

Well, this is a real point of conversion if there ever was one. And I guess maybe I'd start off with, you've been in the news a lot lately, of course, but what piqued my interest to connect with you was your interview on 60 Minutes with Scott Pelley. You said: “An obvious area where there's huge benefits is healthcare. AI is already comparable with radiologists understanding what's going on in medical images. It's going to be very good at designing drugs. It already is designing drugs. So that's an area where it's almost entirely going to do good. I like that area.”

I love that quote Geoff, and I thought maybe we could start with that.


Yeah. Back in 2012, one of my graduate students called George Dahl who did speech recognition in 2009, made a big difference there. Entered a competition by Merck Frost to predict how well particular chemicals would bind to something. He knew nothing about the science of it. All he had was a few thousand descriptors of each of these chemicals and 15 targets that things might bind to. And he used the same network as we used for speech recognition. So he treated the 2000 descriptors of chemicals as if they were things in a spectrogram for speech. And he won the competition. And after he'd won the competition, he wasn't allowed to collect the $20,000 prize until he told Merck how he did it. And one of their questions was, what qsar did you use? So, he said, what's qsar? Now qsar is a field, it has a journal, it's had a conference, it's been going for many years, and it's the field of quantitative structural activity relationships. And that's the field that tries to predict whether some chemical is going to bind to something. And basically he'd wiped out that field without knowing its name.

ERIC TOPOL (02:46):

Well, it's striking how healthcare, medicine, life science has had somewhat of a separate path in recent AI with transformer models and also going back of course to the phenomenal work you did with the era of bringing in deep learning and deep neural networks. But I guess what I thought I'd start with here with that healthcare may have a special edge versus its use in other areas because, of course, there's concerns which you and others have raised regarding safety, the potential, not just hallucinations and confabulation of course a better term or the negative consequences of where AI is headed. But would you say that the medical life science AlphaFold2 is another example of from your colleagues Demis Hassabis and others at Google DeepMind where this is something that has a much more optimistic look?


Absolutely. I mean, I always pivot to medicine as an example of all the good it can do because almost everything it's going to do there is going to be good. There are some bad uses like trying to figure out who to not insure, but they're relatively limited almost certainly it's going to be extremely helpful. We're going to have a family doctor who's seen a hundred million patients and they're going to be a much better family doctor.

ERIC TOPOL (04:27):

Well, that's really an important note. And that gets us to a paper preprint that was just published yesterday, on arXiv, which interestingly isn't usually the one that publishes a lot of medical preprints, but it was done by folks at Google who later informed me was a model large language model that hadn't yet been publicized. They wouldn't disclose the name and it wasn't MedPaLM2. But nonetheless, it was a very unique study because it randomized their LLM in 20 internists with about nine years of experience in medical practice for answering over 300 clinical pathologic conferences of the New England Journal. These are the case reports where the master clinician is brought in to try to come up with a differential diagnosis. And the striking thing on that report, which is perhaps the best yet about medical diagnoses, and it gets back Geoff to your hundred million visits, is that the LLM exceeded the clinicians in this randomized study for coming up with a differential diagnosis. I wonder what your thoughts are on this.


So in 2016, I made a daring and incorrect prediction was that within five years, the neural nets were going to be better than radiologists that interpreting medical scans, it was sometimes taken out of context. I meant it for interpreting medical scans, not for doing everything a radiologist does, and I was wrong about that. But at the present time, they're comparable. This is like seven years later. They're comparable with radiologists for many different kinds of medical scans. And I believe that in 10 years they'll be routinely used to give a second opinion and maybe in 15 years they'll be so good at giving second opinions that the doctor's opinion will be the second one. And so I think I was off by about a factor of three, but I'm still convinced I was completely right in the long term.


So this paper that you're referring to, there are actually two people from the Toronto Google Lab as authors of that paper. And like you say, it was based on the large language PaLM2 model that was then fine-tuned. It was fine-tuned slightly differently from MedPaLM2  I believe, but the LLM [large language model] by themselves seemed to be better than the internists. But what was more interesting was the LLMs when used by the internists made the internists much better. If I remember right, they were like 15% better when they used the LLMs and only 8% better when they used Google search and the medical literature. So certainly the case that as a second opinion, they're really already extremely useful.

ERIC TOPOL (07:48):

It gets again, to your point about that corpus of knowledge that is incorporated in the LLM is providing a differential diagnosis that might not come to the mind of the physician. And this is of course the edge of having ingested so much and being able to play back those possibilities and the differential diagnosis. If it isn't in your list, it's certainly not going to be your final diagnosis. I do want to get back to the radiologist because we're talking just after the annual massive Chicago Radiologic Society of North America RSNA meeting. And at those meetings, I wasn't there, but talking to my radiology colleagues, they say that your projection is already happening. Now that is the ability to not just read, make the report. I mean the whole works. So it may not have been five years when you said that, which is one of the most frequent quotes in all of AI and medicine of course, as you probably know, but it's approximating your prognosis. Even now


I've learned one thing about medicine, which is just like other academics, doctors have egos and saying this stuff is going to replace them is not the right move. The right move is to say it's going to be very good at giving second opinions, but the doctor's still going to be in charge. And that's clearly the way to sell things. And that's fine, just I actually believe that after a while of that, you'll be listening to the AI system, not the doctors. And of course there's dangers in that. So we've seen the dangers in face recognition where if you train on a database that contains very few black people, you'll get something that's very good at recognizing faces. And the people who use it, the police will think this is good at recognizing faces. And when it gives you the wrong identity for a person of color, then the policemen are going to believe it. And that's a disaster. And we might get the same with medicine. If there's some small minority group that has some distinctly different probabilities of different diseases, it's quite dangerous for doctors to get to trust these things if they haven't been very carefully controlled for the training data.

ERIC TOPOL (10:17):

Right. And actually I did want to get back to you. Is it possible for the reason why in this new report that the LLMs did so well is that some of these case studies from New England Journal were part of the pre-training?


That is always a big worry. It's worried me a lot and it's worried other people a lot because these things have pulled in so much data. There is now a way round that at least for showing that the LLMs are genuinely creative. So he's a very good computer science theorist at Princeton called Sanjeev Arora, and I'm going to attribute all this to him, but of course, all the work was done by his students and postdocs and collaborators. And the idea is you can get these language models to generate stuff, but you can then put constraints on what they generate by saying, so I tried an example recently, I took two Toronto newspapers and said, compare these two newspapers using three or four sentences, and in your answer demonstrate sarcasm, a red herring empathy, and there's something else. But I forget what metaphor. Metaphor.

ERIC TOPOL (11:29):

Oh yeah.


And it gave a brilliant comparison of the two newspapers exhibiting all those things. And the point of Sanjeev Arora’s work is that if you have a large number of topics and a large number of different things you might demonstrate in the text, then if I give an topic and I say, demonstrate these five things, it's very, anything in the training data will be on that topic demonstrating those five skills. And so when it does it, you can be pretty confident that it's original. It's not something it saw in the training data. That seems to me a much more rigorous test of whether it generates new stuff. And what's interesting is some of the LLMs, the weaker ones don't really pass the test, but things like GPT-4 that passes the test with flying colors, that definitely generates original stuff that almost certainly was not in the training data.

ERIC TOPOL (12:25):

Yeah. Well, that's such an important tool to ferret out the influence of pre-training. I'm glad you reviewed that. Now, the other question that most people argue about, particularly in the medical sphere, is does the large language model really understand? What are your thoughts about that? We're talking about what's been framed as the stochastic parrot versus a level of understanding or enhanced intelligence, whatever you want to call it. And this debate goes on, where do you fall on that?


I fall on the sensible side. They really do understand. And if you give them quizzes, which involve a little bit of reasoning, it's much harder to do now because of course now GPT-4 can look at what's on the web. So you are worried if I mention a quiz now, someone else may have given it to GPT-4, but a few months ago when you did this before, you could see the web, you could give it quizzes for things that it had never seen before and it can do reasoning. So let me give you my favorite example, which was given to me by someone who believed in symbolic reasoning, but a very honest guy who believed in symbolic reasoning and was very puzzled about whether GT four could do symbolic reasoning. And so he gave me a problem and I made it a bit more complicated.


And the problem is this, the rooms in my house are painted white or yellow or blue, yellow paint fade to white within a year. In two years’ time, I would like all the rooms to be white. What should I do and why? And it says, you don't need to paint the white rooms. You don't need to paint the yellow rooms because they'll fade to white anyway. You need to paint the blue rooms white. Now, I'm pretty convinced that when I first gave it that problem, it had never seen that problem before. And that problem involves a certain amount of just basic common sense reasoning. Like you have to understand that if it faded to yellow in a year and you're interested in the stage in two years’ time, two years is more than one year and so on. When I first gave it the problem and didn't ask you to explain why it actually came up with a solution that involved painting the blue rooms yellow, that's more of a mathematician solution because it reduces it to a solved problem. But that'll work too. So I'm convinced it can do reasoning. There are people, friends of mine like Jan Leike, who is convinced it can't do reasoning. I'm just waiting for him to come to his sense.

ERIC TOPOL (15:18):

Well, I've noticed the back and forth with you and Yann (LeCun) [see above on X]. I know it's a friendly banter, and you, of course, had a big influence in his career as so many others that are now in the front leadership lines of AI, whether it's Ilya Sutskever at OpenAI, who's certainly been in the news lately with the turmoil there. And I mean actually it seems like all the people that did some training with you are really in the leadership positions at various AI companies and academic groups around the world. And so it says a lot about your influence that's not just as far as deep neural networks. And I guess I wanted to ask you, because you're frequently regarded to as the godfather of AI, and what do you think of that getting called that?


I think originally it wasn't meant entirely beneficially. I remember Andrew Ng actually made up that phrase at a small workshop in the town of Windsor in Britain, and it was after a session where I'd been interrupting everybody. I was the kind of leader of the organization that ran the workshop, and I think it was meant as kind of I would interrupt everybody, and it wasn't meant entirely nicely, I think, but I'm happy with it.

ERIC TOPOL (16:45):

That's great.


Now that I'm retired and I'm spending some of my time on charity work, I refer to myself as the fairy godfather.

ERIC TOPOL (16:57):

That's great. Well, I really enjoyed the New Yorker profile by Josh Rothman, who I've worked with in the past where he actually spent time with you up in your place up in Canada. And I mean it got into all sorts of depth about your life that I wasn't aware of, and I had no idea about the suffering that you've had with the cancer of your wives and all sorts of things that were just extraordinary. And I wonder, as you see the path of medicine and AI's influence and you look back about your own medical experiences in your family, do you see where we're just out of time alignment where things could have been different?


Yeah, I see lots of things. So first, Joshua is a very good writer and it was nice of him to do that.


So one thing that occurs to me is actually going to be a good use of LLMs, maybe fine tune somewhat differently to produce a different kind of language is for helping the relatives of people with cancer. Cancer goes on a long time, unlike, I mean, it's one of the things that goes on for longest and it's complicated and most people can't really get to understand what the true options are and what's going to happen and what their loved one's actually going to die of and stuff like that. I've been extremely fortunate because in that respect, I had a wife who died of ovarian cancer and I had a former graduate student who had been a radiologist and gave me advice on what was happening. And more recently when my wife, a different wife died of pancreatic cancer, David Naylor, who you know

ERIC TOPOL (18:54):

Oh yes.


Was extremely kind. He gave me lots and lots of time to explain to me what was happening and what the options were and whether some apparently rather flaky kind of treatment was worth doing. What was interesting was he concluded there's not much evidence in favor of it, but if it was him, he'd do it. So we did it. That's where you electrocute the tumor, being careful not to stop the heart. If you electrocute the tumor with two electrodes and it's a compact tumor, all the energy is going into the tumor rather than most of the energy going into the rest of your tissue and then it breaks up the membranes and then the cells die. We don't know whether that helped, but it's extremely useful to have someone very knowledgeable to give advice to the relatives. That's just so helpful. And that's an application in which it's not kind of life or death in the sense that if you happen to explain it to me a bit wrong, it's not determining the treatment, it's not going to kill the patient.


So you can actually tolerate it, a little bit of error there. And I think relatives would be much better off if they could talk to an LLM and consult with an LLM about what the hell's going on because the doctors never have time to explain it properly. In rare cases where you happen to know a very good doctor like I do, you get it explained properly, but for most people it won't be explained properly and it won't be explained in the right language. But you can imagine an LLM just for helping the relatives, that would be extremely useful. It'd be a fringe use, but I think it'd be a very helpful use.

ERIC TOPOL (20:29):

No, I think you're bringing up an important point, and I'm glad you mentioned my friend David Naylor, who's such an outstanding physician, and that brings us to that idea of the sense of intuition, human intuition, versus what an LLM can do. Don't you think those would be complimentary features?


Yes and no. That is, I think these chatbots, they have intuition that is what they're doing is they're taking strings of symbols and they're converting each symbol into a big bunch of features that they invent, and then they're learning interactions between the features of different symbols so that they can predict the features of the next symbol. And I think that's what people do too. So I think actually they're working pretty much the same way as us. There's lots of people who say, they're not like us at all. They don't understand, but there's actually not many people who have theories of how the brain works and also theories of how they understand how these things work. Mostly the people who say they don't work like us, don't actually have any model of how we work. And it might interest them to know that these language models were actually introduced as a theory of how our brain works.


So there was something called what I now call a little language model, which was tiny. I introduced in 1985, and it was what actually got nature to accept our paper on back propagation. And what it was doing was predicting the next word in a three word string, but the whole mechanism of it was broadly the same as these models. Now, the models are more complicated, they use attention, but it was basically you get it to invent features for words and interactions between features so that it can predict the features of the next word. And it was introduced as a way of trying to understand what the brain was doing. And at the point at which it was introduced, the symbolic AI peoples didn't say, oh, this doesn't understand. They were perfectly happy to admit that this did learn the structure in the tiny domain, the tiny toy domain it was working on. They just argued that it would be better to learn that structure by searching through the space of symbolic rules rather than through the space of neural network weights. But they didn't say this is an understanding. It was only when it really worked that people had to say, well, it doesn't count.

ERIC TOPOL (22:53):

Well, that also something that I was surprised about. I'm interested in your thoughts. I had anticipated that in Deep Medicine book that the gift of time, all these things that we've been talking about, like the front door that could be used by the model coming up with the diagnoses, even the ambient conversations made into synthetic notes. The thing I didn't think was that machines could promote empathy. And what I have been seeing now, not just from the notes that are now digitized, these synthetic notes from the conversation of a clinic visit, but the coaching that's occurring by the LLM to say, well, Dr. Jones, you interrupted the patient so quickly, you didn't listen to their concerns. You didn't show sensitivity or compassion or empathy. That is, it's remarkable. Obviously the machine doesn't necessarily feel or know what empathy is, but it can promote it. What are your thoughts about that?


Okay, my thoughts about that are a bit complicated, that obviously if you train it on text that exhibits empathy, it will produce text that exhibits empathy. But the question is does it really have empathy? And I think that's an open issue. I am inclined to say it does.

ERIC TOPOL (24:26):

Wow, wow.


So I'm actually inclined to say these big chatbots, particularly the multimodal ones, have subjective experience. And that's something that most people think is entirely crazy. But I'm quite happy being in a position where most people think I'm entirely crazy. So let me give you a reason for thinking they have subjective experience. Suppose I take a chatbot that has a camera and an arm and it's being trained already, and I put an object in front of it and say, point at the object. So it points at the object, and then I put a prism in front of its camera that bends the light race, but it doesn't know that. Now I put an object in front of it, say, point at the object, and it points straight ahead, sorry, it points off to one side, even though the object's straight ahead and I say, no, the object isn't actually there, the object straight ahead. I put a prism in front of your camera and imagine if the chatbot says, oh, I see the object's actually straight ahead, but I had the subjective experience that it was off to one side. Now, if the chatbot said that, I think it would be using the phrase subjective experience in exactly the same way as people do,


Its perceptual system told it, it was off to one side. So what its perceptual system was telling, it would've been correct if the object had been off to one side. And that's what we mean by subjective experience. When I say I've got the subjective experience of little pink elephants floating in front of me, I don't mean that there's some inner theater with little pink elephants in it. What I really mean is if in the real world there were little pink elephants floating in front of me, then my perceptual system would be telling me the truth. So I think what's funny about subjective experiences, not that it's some weird stuff made of spooky qualia in an inner theater, I think subjective experiences, a hypothetical statement about a possible world. And if the world were like that, then your perceptual system will be working properly. That's how we use subjective experience. And I think chatbots can use it like that too. So I think there's a lot of philosophy that needs to be done here and got straight, and I didn't think we can lead it to the philosophers. It's too urgent now.

ERIC TOPOL (26:44):

Well, that's actually a fascinating response and added to what your perception of understanding it gets us to perhaps where you were when you left Google in May this year where you had, you saw that this was a new level of whatever you want to call it, not AGI [artificial general intelligence], but something that was enhanced from prior AI. And you basically, in some respects, I wouldn't say sounded any alarms, but you were, you've expressed concern consistently since then that we're kind of in a new phase. We're heading in a new direction with AI. Could you elaborate a bit more about where you were and where your mind was in May and where you think things are headed now?


Okay, let's get the story straight. It's a great story. The news media puts out there, but actually I left Google because I was 75 and I couldn't program any longer because I kept forgetting what the variables stood for. I took the opportunity also, I wanted to watch a lot of Netflix. I took the opportunity that I was leaving Google anyway to start making public statements about AI safety. And I got very concerned about AI safety a couple of months before. What happened was I was working on trying to figure out analog ways to do the computation so you could do these larger language models for much less energy. And I suddenly realized that actually the digital way of doing the computation is probably hugely better. And it's hugely better because you can have thousands of different copies of exactly the same digital model running on different hardware, and each copy can look at a different bit of the internet and learn from it.


And they can all combine what they learned instantly by sharing weights or by sharing weight gradients. And so you can get 10,000 things to share their experience really efficiently. And you can't do that with people. If 10,000 people go off and learn 10,000 different skills, you can't say, okay, let's all average our weight. So now all of us know all of those skills. It doesn't work like that. You have to go to university and try and understand what on earth the other person's talking about. It's a very slow process where you have to get sentences from the other person and say, how do I change my brain? So I might've produced that sentence, and it's very inefficient compared with what these digital models can do by just sharing weights. So I had this kind of epiphany. The digital models are probably much better. Also, they can use the back propagation algorithm quite easily, and it's very hard to see how the brain can do it efficiently. And nobody's managed to come up with anything that'll work in real neural nets as comparable to back propagation at scale. So I had this sort of epiphany, which made me give up on the analog research that digital computers are actually just better. And since I was retiring anyway, I took the opportunity to say, Hey, they're just better. And so we'd better watch out.

ERIC TOPOL (29:56):

Well, I mean, I think your call on that and how you back it up is really, of course had a big impact. And of course it's still an ongoing and intense debate, and in some ways it really was about what was the turmoil at OpenAI was rooted with this controversy about where things are, where they're headed. I want to just close up with the point you made about the radiologists, and not to insult them by saying they'll be replaced gets us to where we are, the tension of today, which is our humans as the pinnacle of intelligence going to be not replaced, but superseded by the likes of AI's future, which of course our species can't handle that a machine, it's like the radiologist, our species can't handle that. There could be this machine that could be with far less connections, could do things outperform us, or of course, as we've, I think emphasized in our conversation in concert with humans to even take it to yet another level. But is that tension about that there's this potential for machines outdoing people part of the problem that it's hard for people to accept this notion?


Yes, I think so. So particularly philosophers, they want to say there's something very special about people. That's to do with consciousness and subjective experience and sentience and qualia, and these machines are just machines. Well, if you're a sort of scientific materialist, most of us are brain's just a machine. It's wrong to say it's just a machine because a wonderfully complex machine that does incredible things that are very important to people, but it is a machine and there's no reason in principle why there shouldn't be better machines than better ways of doing computation, as I now believe there are. So I think people have a very long history of thinking. They're special.


They think God made them in his image and he put them at the center of the universe. And a lot of people have got over that and a lot of people haven't. But for the people who've got over that, I don't think there's any reason in principle to think that we are the pinnacle of intelligence. And I think it may be quite soon these machines are smarter than us. I still hope that we can reach a agreement with the machines where they act like benevolent parents. So they're looking out for us. They have, we've managed to motivate them, so the most important thing for them is our success, like it is with a mother and child, not so much for men. And I would really like that solution. I'm just fearful we won't get it.

ERIC TOPOL (33:15):

Well, that would be a good way for us to go forward. Of course, the doomsayers and the people that are much worse at their level of alarm tend to think that that's not possible. But we'll see obviously over time. Now, one thing I just wanted to get a quick read from you before we close is as recently, Demis Hassabis and John Jumper got the Lasker Award, like a pre Nobel Award for AlphaFold2. But this transformer model, which of course has helped to understand the structure 3D of 200 million proteins, they don't understand how it works. Like most models, unlike the understanding we were talking about earlier on the LLM side. I wrote that I think that with this award, an asterisk should have been given to the AI model. What are your thoughts about that idea?


It's like this, I want people to take what I say seriously, and there's a whole direction you could go in that I think Larry Page, one of the founders of Google has gone in this direction, which is to say there's these super intelligences and why shouldn't they have rights? If you start going in that direction, you are going to lose people. People are not going to accept that these things should have political rights, for example. And being a co-author is the beginning of political rights. So I avoid talking about that, but I'm sort of quite ambivalent and agnostic about whether they should. But I think it's best to stay clear of that issue just because the great majority of people will stop listening to you if you say machines should have rights.

ERIC TOPOL (35:28):

Yeah. Well, that gets us course of what we just talked about and how it's hard the struggle between humans and machines rather than the thought of humans plus machines and symbiosis that can be achieved. But Geoff, this has been a great, we've packed a lot in. Of course, we could go on for hours, but I thoroughly enjoyed hearing your perspective firsthand and your wisdom, and just to reinforce the point about how many of the people that are leading the field now derive a lot of their roots from your teaching and prodding and challenging and all that. We're indebted to you. And so thanks so much for all you've done and we'll continue to do to help us, guide us through the very rapid dynamic phase as AI moves ahead.


Thanks, and good luck with getting AI to really make a big difference in medicine.

ERIC TOPOL (36:25):

Hopefully we will, and I'll be consulting with you from time to time to get some of that wisdom to help us