Daphne Koller: The Convergence of A.I. and Digital Biology

Ground Truths

0:00

-35:15

Daphne Koller: The Convergence of A.I. and Digital Biology

A brilliant pioneer in A.I. and its role in life science, and the Founder and CEO of insitro, shares the excitement for how these fields are coming together to change the future of medicine

Eric Topol

Mar 10, 2024

Share Ground Truths

Transcript

Eric Topol (00:06):

Well, hello, this is Eric Topol with Ground Truths and I am absolutely thrilled to welcome Daphne Koller, the founder and CEO of insitro, and a person who I've been wanting to meet for some time. Finally, we converged so welcome, Daphne.

Daphne Koller (00:21):

Thank you Eric. And it's a pleasure to finally meet you as well.

Eric Topol (00:24):

Yeah, I mean you have been rocking everybody over the years with elected to the National Academy of Engineering and Science and right at the interface of life science and computer science and in my view, there's hardly anyone I can imagine who's doing so much at that interface. I wanted to first start with your meeting in Davos last month because I kind of figured we start broad AI rather than starting to get into what you're doing these days. And you had a really interesting panel [←transcript] with Yann LeCun, Andrew Ng and Kai-Fu Lee and others, and I wanted to get your impression about that and also kind of the general sense. I mean AI is just moving it at speed, that is just crazy stuff. What were your thoughts about that panel just last month, where are we?

Video link for the WEF Panel

Daphne Koller (01:25):

I think we've been living on an exponential curve for multiple decades and the thing about exponential curves is they are very misleading things. In the early stages people basically take the line between whatever we were last year, and this year and they interpolate linearly, and they say, God, things are moving so slowly. Then as the exponential curve starts to pick up, it becomes more and more evident that things are moving faster, but it’s still people interpolate linearly and it's only when things really hit that inflection point that people realize that even with the linear interpolation where we'll be next year is just mind blowing. And if you realize that you're on that exponential curve where we will be next year is just totally unanticipatable. I think what we started to discuss in that panel was, are we in fact on an exponential curve? What are the rate limiting factors that may or may not enable that curve to continue specifically availability of data and what it would take to make that curve available in areas outside of the speech, whatever natural language, large language models that exist today and go far beyond that, which is what you would need to have these be applicable to areas such as biology and medicine.

Daphne Koller (02:47):

And so that was kind of the message to my mind from the panel.

Eric Topol (02:53):

And there was some differences in opinion, of course Yann can be a little strong and I think it was good to see that you're challenging on some things and how there is this “world view” of AI and how, I guess where we go from here. As you mentioned in the area of life science, there already had been before large language models hit stride, so much progress particularly in imaging cells, subcellular, I mean rare cells, I mean just stuff that was just without any labeling, without fluorescein, just amazing stuff. And then now it's gone into another level. So as we get into that, just before I do that, I want to ask you about this convergence story. Jensen Huang, I'm sure you heard his quote about biology as the opportunity to be engineering, not science. I'm sure if I understand, not science, but what about this convergence? Because it is quite extraordinary to see two fields coming together moving at such high velocity.

"Biology has the opportunity to be engineering not science. When something becomes engineering not science it becomes...exponentially improving, it can compound on the benefits of previous years." -Jensen Huang, NVIDIA.

Daphne Koller (04:08):

So, a quote that I will replace Jensen's or will propose a replacement for Jensen's quote, which is one that many people have articulated, is that math is to physics as machine learning is to biology. It is a mathematical foundation that allows you to take something that up until that point had been kind of mysterious and fuzzy and almost magical and create a formal foundation for it. Now physics, especially Newtonian physics, is simple enough that math is the right foundation to capture what goes on in a lot of physics. Biology as an evolved natural system is so complex that you can't articulate a mathematical model for that de novo. You need to actually let the data speak and then let machine learning find the patterns in those data and really help us create a predictability, if you will, for biological systems that you can start to ask what if questions, what would happen if we perturb the system in this way?

The Convergence

Daphne Koller (05:17):

How would it react? We're nowhere close to being able to answer those questions reliably today, but as you feed a machine learning system more and more data, hopefully it'll become capable of making those predictions. And in order to do that, and this is where it comes to this convergence of these two disciplines, the fodder, the foundation for all of machine learning is having enough data to feed the beast. The miracle of the convergence that we're seeing is that over the last 10, 15 years, maybe 20 years in biology, we've been on a similar, albeit somewhat slower exponential curve of data generation in biology where we are turning it into a quantitative discipline from something that is entirely observational qualitative, which is where it started, to something that becomes much more quantitative and broad based in how we measure biology. And so those measurements, the tools that life scientists and bioengineers have developed that allow us to measure biological systems is what produces that fodder, that energy that you can then feed into the machine learning models so that they can start making predictions.

Eric Topol (06:32):

Yeah, well I think the number of layers of data no less what's in these layers is quite extraordinary. So some years ago when all the single cell sequencing was started, I said, well, that's kind of academic interest and now the field of spatial omics has exploded. And I wonder how you see the feeding the beast here. It's at every level. It's not just the cell level subcellular and single cell nuclei sequencing single cell epigenomics, and then you go all the way to these other layers of data. I know you plug into the human patient side as well as it could be images, it could be past slides, it could be the outcomes and treatments and on and on and on. I mean, so when you think about multimodal AI, has anybody really done that yet?

Daphne Koller (07:30):

I think that there are certainly beginnings of multimodal AI and we have started to see some of the benefits of the convergence of say, imaging and omics. And I will give an example from some of the work that we've recently distributed on a preprint server work that we did at insitro, which took imaging data from standard histopathology slides, H&E slides and aligned them with simple bulk RNA-Seq taken from those same tumor samples. And what we find is that by training models that translate from one to the other, specifically from the imaging to the omics, you're able to, for a fairly large fraction of genes, make very accurate predictions of gene expression levels by looking at the histopath images alone. And in fact, because many of the predictions are made at the tile level, not at the entire slide level, even though the omics was captured in bulk, you're able to spatially resolve the signal and get kind of like a pseudo spatial biology just by making predictions from the H&E image into these omic modalities.

Multimodal A.I. and Life Science

Daphne Koller (08:44):

So there are I think beginnings of multimodality, but in order to get to multimodality, you really need to train on at least some data where the two modalities are simultaneously. And so at this point, I think the rate limiting factor is more a matter of data acquisition for training the models. It is for building the models themselves. And so that's where I think things like spatial biology, which I think like you are very excited about, are one of the places where we can really start to capture these paired modalities and get to some of those multimodal capabilities.

Eric Topol (09:23):

Yeah, I wanted to ask you because I mean spatial temporal is so perfect. It is two modes, and you have as the preprint you refer to and you see things like electronic health records in genomics, electronic health records in medical images. The most we've done is getting two modes of data together. And the question is as this data starts to really accrue, do we need new models to work with it or do you actually foresee that that is not a limiting step?

Daphne Koller (09:57):

So I think currently data availability is the most significant rate limiting step. The nice thing about modern day machine learning is that it really is structured as a set of building blocks that you can start to put together in different ways for different situations. And so, do we have the exact right models available to us today for these multimodal systems? Probably not, but do we have the right building blocks that if we creatively put them together from what has already been deployed in other settings? Probably, yes. So of course there's still a model exploration to be done and a lot of creativity in how these building blocks should be put together, but I think we have the tools available to solve these problems. What we really need is first I think a really significant data acquisition effort. And the other thing that we need, which is also something that has been a priority for us at insitro, is the right mix of people to be put together so that you can, because what happens is if you take a bunch of even extremely talented and sophisticated machine learning scientists and say, solve a biological problem, here's a dataset, they don't know what questions to ask and oftentimes end up asking questions that might be kind of interesting from machine learning perspective, but don't really answer fundamental biology questions.

Daphne Koller (11:16):

And conversely, you can take biologists and say, hey, what would you have machine learning do? And they will tell you, well, in our work we do A to B to C to D, and B to C is kind of painful, like counting nuclei is really painful, so can we have the machine do that for us? And it's kind of like that. Yeah, but that's boring. So what you get if you put them in a room together and actually get to the point where they communicate with each other effectively, is that not only do you get better solutions, you get better problems. I think that's really the crux of making progress here besides data is the culture and the people.

A.I. and Drug Discovery

Eric Topol (11:54):

Well, I'm sure you've assembled that at insitro knowing you, and I mean people tend to forget it's about the people, it's not about the models or even the data when you have all that. Now you've been onto drug discovery paths, there's at least 20 drugs that are AI driven that are in the clinic in phase one or two at some point. Obviously these are not only ones that you've been working on, but do you see this whole field now going into high gear because of this? Or is that the fact that there's all these AI companies partnering with big pharma? Is it a lot of nice agreements that are drawn up with multimillion dollar milestones or is this real?

Daphne Koller (12:47):

So there's a number of different layers to your question. First of all, let me start by saying that I find the notion of AI driven drugs to be a bit of a weird concept because over time most drugs will have some element of AI in them. I mean, even some of the earlier work used data science in many cases. So where do you draw the boundary? I mean, we're not going to be in a world anytime soon where AI starts out with, oh, I need to work on ALS and at the end there is a clinical trial design ready to be submitted to the FDA without anything, any human intervention in the middle. So, it's always going to be an interplay between a machine and a human with over time more and more capabilities I think being taken on by the machine, but I think inevitably a partnership for a long time to come.

Daphne Koller (13:41):

But coming to the second part of your question, is this real? Every big pharma has gotten to the point today that they realize they need some of that AI thing that's going around. The level of sophistication of how they incorporate that and their willingness to make some of the hard decisions of, well, if we're going to be doing this with AI, it means we shouldn't be doing it the old way anymore and we need to make a big dramatic internal shift that I think depends very much on the specific company. And some companies have more willingness to take those very big steps than others, so will some companies be able to make the adjustment? Probably. Will all of them? Probably not. I would say however, that in this new world there is also room for companies to emerge that are, if you will, AI native.

Daphne Koller (14:39):

And we've seen that in every technological revolution that the native companies that were born in the new age move faster, incorporate the technology much more deeply into every aspect of their work, and they end up being dominant players if not the dominant player in that new world. And you could look at the internet revolution and think back to Google did not emerge from the yellow pages. Netflix did not emerge from blockbuster, Amazon did not emerge from Walmart so some of those incumbents did make the adjustment and are still around, some did not and are no longer around. And I think the same thing will happen with drug discovery and development where there will be a new crop of leading companies to I think maybe together with some of the incumbents that we're able to make the adjustment.

Eric Topol (15:36):

Yeah, I think your point there is essential, and another part of this story is that a lot of people don't realize there's so many nodes of ways that AI can facilitate this whole process. I mean from the elemental data mining that identified Baricitinib for Covid and now being used even for many other indications, repurposing that to how to simulate for clinical trials and everything in between. Now, what seems like because of your incredible knack and this convergence, I mean your middle name is like convergence really, you are working at the level of really in my view, this unique aspect of bringing cells and all the other layers of data together to amp things up. Is that a fair assessment of where insitro in your efforts are directed?

Three Buckets

Daphne Koller (16:38):

So first of all, maybe it's useful to kind of create the high level map and the simplest version I've heard is where you divide the process into three major buckets. One is what you think of as biology discovery, which is the discovery of new therapeutic hypotheses. Basically, if you modulate this target in this group of humans, you will end up affecting this clinical outcome. That's the first third. The middle third is, okay, well now we need to turn that hypothesis into an actual molecule that does that. So basically generating molecules. And then finally there's the enablement and acceleration of the clinical development process, which is the final third. Most companies in the AI space have really focused in on that middle third because it is well-defined, you know when you've succeeded if someone gives you a target and what's called a target product profile (TPP) at the end of whatever, two, three years, whether you've been able to create a molecule that achieves the appropriate properties of selectivity and solubility and all those other things. The first third is where a lot of the mistakes currently happen in drug discovery and development. Most drugs that go into the clinic don't fail because we didn't have the right molecule. I mean that happens, but it's not the most common failure mode. The most common failure mode is that the target was just a wrong target for this disease in this patient population.

Daphne Koller (18:09):

So the real focus of us, the core of who we are as a company is on that early third of let's make sure we're going after the right clinical hypotheses. Now with that, obviously we need to make molecules and some of those molecules we make in-house, and obviously we use machine learning to do that as well. And then the last third is we discover that if you have the right therapeutic hypothesis, which includes which is the right patient population, that can also accelerate and enable your clinical trials, so we end up doing some of that as well. But the core of what we believe is the failure mode of drug discovery and what it's going to take to move it to the next level is the articulation of therapeutic hypotheses that actually translate into clinical outcome. And so in order to do that, we've put together, to your point about convergence, two very distinct types of data.

Daphne Koller (19:04):

One is data that we print in our own internal data factory where we have this incredible set of capabilities that uses stem cells and CRISPR and microscopy and single cell measurements and spatial biology and all that to generate massive amounts of in-house data. And then because ultimately you care not about curing cells, you care about curing people, you also need to bring in the clinical data. And again, here also we look at multiple high content data modalities, imaging and omics, and of course human genetics, which is one of the few sources of ground truth for causality that is available in medicine and really bring all those different data modalities across these two different scales together to come up with what we believe are truly high quality therapeutic hypotheses that we then advance into the clinic.

AlphaFold2, the Exemplar

Eric Topol (19:56):

Yeah, no, I think that's an extraordinary approach. It's a bold, ambitious one, but at least it is getting to the root of what is needed. One of the things you mentioned of course, is the coming up with molecules, and I wanted to get your comments about the AlphaFold2 world and the ability to not just design proteins now of course that are not extant proteins, but it isn't just proteins, it could be antibodies, it could be peptides and small molecules. How much does that contribute to your perspective?

Daphne Koller (20:37):

So first of all, let me say that I consider the AlphaFold story across its incarnations to be one of the best examples of the hypothesis that we set out trying to achieve or trying to prove, which is if you feed a machine learning model enough data, it will learn to do amazing things. And the space of protein folding is one of those areas where there has been enough data in biology that is the sequence to structure mapping is something that over the years, because it's so consistent across different cells, across different species even, we have a lot of data of sequence to structure, which is what enabled AlphaFold to be successful. Now since then, of course, they've taken it to a whole new level. I think what we are currently able to do with protein-based therapeutics is entirely sort of a consequence of that line of development. Whether that same line of development is also going to unlock other therapeutic modalities such as small molecules where the amount of data is unfortunately much less abundant and often locked away in the bowels of big pharma companies that are not eager to share.

Daphne Koller (21:57):

I think that question remains. I have not yet seen that same level of performance in de novo design of small molecule therapeutics because of the data availability limitations. Now people have a lot of creative ideas about that. We use DNA encoded libraries as a way of generating data at scale for small molecules. Others have used other approaches including active learning and pre-training and all sorts of approaches like that. We're still waiting, I think for a truly convincing demonstration that you can get to that same level of de novo design in small molecules as you can in protein therapeutics. Now as to how that affects us, I'm so excited about this development because our focus, as I mentioned, is the discovery of novel therapeutic hypotheses. You then need to turn those therapeutic hypotheses into actual molecules that do the work. We know we're not going to be the expert in every single therapeutic modality from small molecules to macro cycles, to the proteins to mRNA, siRNA, there's so many of those that you need to have therapeutic modality experts in each of those modalities that can then as you discover a target that you want to modulate, you can basically go and ask what is the right partner to help turn this into an actual therapeutic intervention?

Daphne Koller (23:28):

And we've already had some conversations with some modality partners as we like to call them that help us take some of our hypotheses and turn it into molecules. They often are very hungry for new targets because they oftentimes kind of like, okay, here's the three or four or whatever, five low hanging fruits that our technology uniquely unlocks. But then once you get past those well validated targets like, okay, what's next? Am I just going to go read a bunch of papers and hope for the best? And so oftentimes they're looking for new hypotheses and we're looking for partners to make molecules. It's a great partnership.

Can We Slow the Aging Process?

Eric Topol (24:07):

Oh yeah, no question about that. Now, we've seen in recent times some leaps in drugs that were worked on for decades, like the GLP-1s for obesity, which are having effects potentially well beyond obesity didn't require any AI, but just slogging away at it for decades. And you previously were at Calico, which is trying to deal with aging. Do you think that we're going to see drug interventions that are going to slow the aging process because of this unique time of this exponential point we are in where we're a computer and science and digital biology come together?

Daphne Koller (24:52):

So I think the GLP-1s are an incredible achievement. And I would point out, I know you said and incorrectly that it didn't use any AI, but they did actually use an understanding of human genetics. And I think human genetics and the genotype phenotype statistical associations that they revealed is in some ways the biological precursor to AI it is a way of leveraging very large amounts of data, admittedly using simpler statistical tools, but still to discover in a data-driven way, novel therapeutic hypothesis. So I consider the work that we do to be a progeny of the kind of work that statistical geneticists have done. And of course a lot of heavy lifting needed to be done after that in order to make a drug that actually worked and kudos to the leaders in that space. In terms of the modulation of aging, I mean aging is a process of decline over time, and the rate of that decline is definitely something that is modifiable.

Daphne Koller (26:07):

And we all know that external factors such as lifestyle, diet, exercise, even exposure to sun or smoking, accelerates the aging process. And you could easily imagine, as we've seen in the GLP-1s that a therapeutic intervention can change that trajectory. So will we be able to using therapeutic interventions, increase health span so that we live healthy longer? I think the answer to that is undoubtedly, yes. And we've seen that consistently with therapeutic interventions, not even just the GLP-1s, but going backwards, I mean even statins and earlier things. Will we be able to increase the maximum life span so that people habitually live past 120, 150? I don't know. I don't know that anybody knows the answer to that question. I personally would be quite happy with increasing my health span so that at the age of 80, I'm still able to actively go hiking and scuba diving at 90 and 100 and that would be a pretty good place to start.

Eric Topol (27:25):

Well, I'm with you on that, but I just want to ask though, because the drugs we have today that are highly effective, I mean statins is a good example. They work at a particular level of the body. They don't have across the board modulation of effect. And I guess what I was asking is, do you foresee we will have some way to do that across all systems? I mean, that is getting to, now that we have so many different ways to intervene on the process, is there a way that you envision in the future that we'll be able to here, I'm not talking about in expanding lifespan, I'm talking about promoting health, whether it's the immune system or whether it's through mitochondria and mTOR, caloric, I mean all these different things you think that's conceivable or is that just, I mean companies like Calico and others have been chasing this. What do you think?

Daphne Koller (28:30):

Again, I think it's a thing that is hard to predict. I mean, we know that different organ systems age at different rates, and is there a single bio even in a single individual, and it's been well established that you can test brain age versus muscle health versus cardiovascular, and they can be quite different in the same individual, so is there a single hub? No, that governs all forms of aging. I don't know if that's true. I think it's oftentimes different. We know protein folding has an effect, you know DNA damage has an effect. That's why our skin ages because it's exposed to sun. Is there going to be a single switch that reverts it all back? Certainly some companies are pursuing that single bullet approach. I personally would probably say that based on the biology that I've seen, there's at least as much potential in trying to find ways to slow the decline in a way that specific to say as we discussed the immune system or correcting protein, misfolding dysfunction or things like that. And I'm not dismissing there is a single magic switch, but let's just say I think we should be exploring multiple alternatives.

Eric Topol (29:58):

Yeah, no, I like your reasoning. I think it's actually like everything else you said here. It makes a lot of sense. The logic is hard to argue with. Well, I think what you're doing there at insitro is remarkable and it seems to be quite distinct from other strategies, and that's not at all surprising knowing your background and your aspiration.

Daphne Koller (30:27):

Never like to follow the crowd. It's boring.

Eric Topol (30:30):

Right, and I do know you left an aging directed company effort at Calico to do what you're doing. So that must have been an opening for you that you saw was much more diverse perhaps, or maybe I'm mistaken that Calico is not really age specific in its goals.

Daphne Koller (30:49):

So what inspired me to go found insitro was the realization that we are making medicines today in a way that is not that different from the way in which we were making medicines 20 or 30 years ago in terms of the process by which we go from a, here's what I want to work on to here's a drug is a very much an artisanal one-off each one of them is a snowflake. There is very little commonality and sharing of insights and infrastructure across those efforts except in relatively limited tool-based ways. And I wanted to change that. I wanted to take the tools of engineering and data and machine learning and build a very different approach of going from a problem definition to a therapeutic intervention. And it didn't make sense to build that within a company that's focused on any single biology, not just aging because it is such a broad-based foundation.

Daphne Koller (31:58):

And I will tell you that I think we are on the path to building the thing that I set out to build. And as one example of that, I will use the work that we've recently done in metabolic disease where based on the foundations that we've built using both the clinical machine learning work and the cellular machine learning work, we were able to go from a problem articulation of this is the indication that we want to work on to a proof of concept in a translatable animal model in one year. That is pretty unusual. Admittedly, this is with an SiRNA tool compound. Nice thing about things that are liver directed is that it's not that difficult of a path to go from an SiRNA tool compound to an actual SiRNA drug. And so hopefully that's a fairly linear journey from there even, which is great.

Daphne Koller (32:51):

But the fact that we were able to go from problem articulation to a proof of concept in a translatable animal model in one year, that is unusual. And we're starting to see that now across our other therapeutic areas. It takes a long time to build a platform because you're basically building a foundation. It's like, okay, where's the fruit of all of that? I mean, you're building and building and building and nothing comes out for a while because you're building so much of the infrastructure. But once you've built it, you turn the crank and stuff starts to come out, you turn the crank again, and it works faster and better than the previous time. And so the essence of what we've built and what has turned into the tagline for the company is what we call pipeline through platform, which is we're building a pipeline of therapeutic interventions that comes off of a platform. And that's rare in biopharma, the only platform companies that really have emerged by and larger therapeutic modality platforms, things like Moderna and Alnylam, which have gotten really good at a particular modality and that's awesome. We're building a discovery platform and that is a fairly unusual thing.

Eric Topol (34:02):

Right. Well, I have no doubt you'll be discovering a lot of important things. That one sounds like it could be a big impact on NASH.

Daphne Koller (34:14):

Yeah, we hope so.

Eric Topol (34:14):

A big unmet need that's not going to be fixed by what we have today. So Daphne, it's really a joy to talk with you and palpable enthusiasm for where the field is going as one of its real leaders and we'll be cheering for you. I hope we'll reconnect in the times ahead to get another progress report because you're definitely rocking it there and you've got a lot of great ideas for how to change the life science medical world of the future.

Daphne Koller (34:48):

Thank you so much. It's a pleasure to meet you, and it's a long and difficult journey, but I think we're on the right path, so looking forward to seeing that all that pan out.

Eric Topol (34:58):

You made a compelling case in a short visit, so thank you.

Daphne Koller (35:02):

Thank you so much.

Share Ground Truths

Thanks for your subscription and listening/reading these posts.

All content on Ground Truths—newsletter analyses and podcasts—is free.

Voluntary paid subscriptions all go to support Scripps Research.