Ground Truths
Ground Truths
Jim Collins: Discovery of the First New Structural Class of Antibiotics in Decades, Using A.I.

Jim Collins: Discovery of the First New Structural Class of Antibiotics in Decades, Using A.I.

Starting from >12,000 compounds to find, with explainability, new drugs effective vs methicillin-resistant Staph aureus (MRSA)

No transcript...

Jim Collins is one of the leading biomedical engineers in the world. He’s been elected to all 3 National Academies (Engineering, Science, and Medicine) and is one of the founders of the field of synthetic biology. In this conversation, we reviewed the seminal discoveries that he and his colleagues are making at the Antibiotics-AI Project at MIT.


Recorded 5 February 2024, transcript below with audio links and external links to recent publications

Eric Topol (00:05):

Hello, it's Eric Topol with Ground Truths, and I have got an extraordinary guest with me today, Jim Collins, who's the Termeer Professor of Medical Engineering at MIT. He also holds appointments at the Wyss Institute and the Broad Institute. He is a biomedical engineer who's been making exceptional contributions and has been on a tear lately, especially in the work of discovery of very promising, exciting developments in antibiotics. So welcome, Jim.

Jim Collins (00:42):

Eric, thanks for having me on the podcast.

Eric Topol (00:44):

Well, this was a shock when I saw your paper in Nature in December about a new structure class of antibiotics, the one from 1962 to 2000. It took 38 years, and then there was another one that took 24 years yours, the structural antibiotics. Before I get to that though, I want to go back just a few years to the work you did published in Cell with halicin, and can you tell us about this? Because when I started to realize what you've been doing, what you've been chipping away here, this was a drug you found, halicin, as I can try to understand, it works against tuberculosis, c. difficile, enterobacter that are resistant, acinetobacter that are resistant. I mean, this is, and this is of course in mice models. Can you tell us how did you make that discovery before we get into I guess what's called the Audacious Project?

Jim Collins (01:48):

Yeah, sure. It's actually a fun story, so it is origins go broadly to institute wide event at MIT, so MIT in 2018 launched a major campus-wide effort focused on artificial intelligence. The institute, which had played a major role in the first wave of AI in the 1950s, 1960s, and a major wave in the second wave in the 1980s found itself kind of at the wheel in this third wave involving big data and deep learning and looked to correct that and to correct it the institute had a symposium and I had the opportunity to sit next to Regina Barzilay, one of our faculty here at MIT who specializes in AI and particularly AI applied to biomedicine and we really hit it off and realized we had interest in applying AI to drug discovery. My lab had focused on antibiotics to then close to 15 years, but primarily we're using machine learning and network biology to understand the mechanism of action of antibiotics and how resistance arise with the goal of boosting what we already had, with Regina we saw there was an opportunity to see if we could use deep learning to get after discovery.


And notably, as you kind of alluded in your introduction, there's really been a discovery void and the golden age of discovery antibiotics was in the forties, fifties and sixties before I was born and before you had the genomic revolution, the biotech revolution, AI revolution. Anyways, we got together with our two groups, and it was an unfunded project and we kind of cobbled together very small training set of 2,500 compounds that included 1,700 FDA approved drugs and 800 natural compounds. In 2018, 2019, when you started this, if you asked any AI expert should you initiate that study, they would say absolutely not, there's going to be two big data. The idea of these models are very data hungry. You need a million pictures of a dog, a million pictures of a cat to train a model to differentiate between the cat and the dog, but we ignored the naysayers and said, okay, let's see what we can do.


And we apply these to E. coli, so a model pathogen that's used in labs but is also underlies urinary tract infections. So it’s a look to see which of the molecules inhibited growth of the bacteria as evidence for antibacterial activity and we could have measured and we quantified each of their effects, but because we had so few compounds, we just discretized instead, if you inhibited at least 80% of the growth you were antibacterial, and if you didn't achieve that, you weren't antibacterial zero in ones. We then took the structure of each molecule and trained a deep learning model, specifically a graphical neural net that could look at those structures, bond by bond, substructure by substructure associated with whatever features you look to train with. In our case, making for good antibiotic, not for good antibiotic. We then took the train model and applied it to a drug repurposing hub as part of the Broad Institute that consists of 6,100 molecules in various stages of development as a new drug.


And we asked the model to identify molecules that can make for a good antibiotic but didn't look like existing antibiotics. So part of the discovery void has been linked to this rediscovery issue we have where we just keep discovering quinolones like Cipro or beta-lactams like penicillin. Well, anyways, from those criteria as well as a small tox model, only one molecule came out of that, and that was this molecule we called halicin, which was named after HAL, the killing AI computer system from 2001 Space Odyssey. In this case, we don't want it to kill humans, we want it to kill bacteria and as you alluded, it turned out to be a remarkably potent novel antibiotic that killed off multi-drug resistant extensively drugs, a pan-resistant bacteria went after to infections. It was affected against TB, it was affected against C. diff and acinetobacter baumannii and acted to a completely new mechanism of action.


And so we were very excited to see how AI could open up possibilities and enable one to explore chemical spaces in new and different ways. We took them model, then applied it to a very large chemical library of 1.5 billion molecules, looked at a subset of about 110 million that would be impossible for any grad student, any lab really to look at that experimentally but we looked at it in a model computer system and in three days could screen those 110 million molecules and identified several new additional candidates, one which we call salicin, which is the cousin of halicin that similes broad spectrum and acts to a novel mechanism of action.

Eric Topol (06:07):

So before we go further with this initial burst of discovery, for those who are not used to deep neural networks, I think most now are used to the convolutional neural network for images, but what you use specifically here as you alluded to, were graph neural networks that you could actually study the binding properties. Can you just elaborate a little bit more about these GNN so that people know this is one of the tools that you used?

Jim Collins (06:40):

Yeah, so in this case, the underlying structure of the model can actually represent and capture a graphical structure of a molecule or it might be of a network so that the underlying structure itself of the model will also look at things like a carbon atom connects to an oxygen atom. The oxygen atom connects to a nitrogen atom and so when you think back to the chemical structures we learned in high school, maybe we learned in college, if we took chemistry class in college, it was actually a model that can capture the chemical structure representation and begin to look at sub aspects of it, associating different properties of it. In this case, again, ours was antibacterial, but it could be toxic, whether it's toxic against a human cell and the model, the train model, the graph neural model can now look at new structures that you input them and then make calculations on those bonds so a bond would be a connection between two atoms or substructures, be multiple bonds, interconnecting multiple atoms and assign it a score. Does it make, for example, in our case, for a good antibiotic.

Eric Topol (07:48):

Right. Now, what's also striking as you set up this collaboration that's interdisciplinary with Regina, who I know of her work through breast cancer AI and not through drug discovery and so this was, I think that new effort and this discovery led to this, I love the name of it, Audacious Project, right?

Jim Collins (08:13):

Right. Yeah, so a few points on the collaboration then I'll speak to Audacious Project. In addition to Regina, we also brought in Tommi Jaakkola, another AI faculty member and marvelous colleague here at MIT and really we've benefited from having outstanding young folks who were multilingual. We had very rich, deep trained grad students from ML on Regina and Tommi's side who appreciated the biology and we had very richly, deeply trained postdocs, Jon Stokes in particular from the microbiology side on my side, who could appreciate the machine learning and so they could speak across the divide. And so, as I look out in the next few decades in this exciting time of AI coming into biomedicine, I think the groups will make a difference of those that have these multilingual young trainees and two who are well set up to also inject human intelligence with machine intelligence.


Brings the Audacious Project. Now, prior to our publication of halicin, I was invited by the Audacious Project to submit a proposal, the Audacious Project is a new philanthropic effort run by TED, so the group that does the TED Talks that's run by Chris Anderson, so Chris had the idea that there was a need to bring together philanthropists around the world to go for a larger scale in a collective manner toward audacious projects. I pitched them on the idea that we could use AI to address the antibiotic resistance crisis. As you can appreciate, and many of your listeners can appreciate that we're doomed if we don't actually address this soon, in that the number of resistance strains that are in our communities, in our hospitals has been growing decade upon decade, and yet the number of new antibiotics being developed and approved has been dropping decade upon decade largely because the antibiotic market is broken, it costs just as much to develop an antibiotic as it does a cancer drug or a blood pressure drug.


But antibiotic you take once or maybe over the course of three to five days, blood pressure, drug cancer drug you might take for months if not for the rest of your life. Pricing points for antibiotics are small dollars, cancer drugs, blood pressure drugs, thousands if not hundreds of thousands. We pitched this idea that we can maybe turn to AI and use the power of AI to address this crisis and see if we could use our wits to outcompete the genes of superbugs and Chris and his team really were taken with this, and we worked with them over the course of nine months and learned how to make the presentations and pulled this together. Chris took our pitches to a number of really active and fantastic philanthropists, and they got behind us and gave us a good amount of money to launch what we have now called the Antibiotics-AI Project at MIT and in conjunction with it and also using funding from the Audacious Project, we've launched a nonprofit called Phare Bio which is French for lighthouse, so our notion is that antibiotics are public good that we need to get behind his community and Phare Bio, which is run by Akhila Kosaraju, she's the CEO and President, is the mission of which is to take the most promising molecules out of the antibiotics AI project and advance them towards the clinic through partnerships with biotech, with pharma, with other nonprofits, with nation states as needed.

Eric Topol (11:18):

Well, before I get to the next chain of discovery and as explain ability features, which we all like to see when you can explain stuff with AI, did halicin because of this remarkable finding, did it get into clinical trials yet?

Jim Collins (11:36):

It's being advanced quite nicely and aggressively by Phare Bio. So Phare Bio is in discussions with the Department of Defense and BARDA, and actually on an interesting feature of halicin is that it acts like a flash bomb in the gut, meaning that when delivered orally to the gut, it only acts briefly and very quickly in a fairly narrow spectrum manner as well, so that it can go after pathogens sparing the commensals. One of the challenges our US military face is one of the challenges many militaries face are gut issues when soldiers are first deployed to a new location, and it can disable the soldiers for three to four weeks. And so, there's a lot of excitement that halicin might be effective as a treatment to help prevent gut dysbiosis resulting from new deployments.

Eric Topol (12:27):

Oh wow. That's another application that I would never have thought of. Interesting, so you then moved on to this really big report in Nature, which I think this is now involving a transformer model as I recall. So you can explain the difference and you made a discovery from a massive, again, number of potential compounds to staph aureus resistant methicillin resistant agents that were very potent in vivo. So how did you make this big jump? This is a whole new structural class of antibiotics.

Jim Collins (13:11):

Yeah, so we made this jump, this was an effort led by Felix Wong, who's a really talented postdoc in my lab, and we got intrigued of to what extent could we expand the utility of AI and biology of medicine. As you can appreciate that, that many of our colleagues are underwhelmed by the black box nature of many AI models and by black box I mean that when you train your model, you then largely use it as a filter where you'll provide the model with some input. You look at the output and the outputs, what's of interest to you, but you don't really understand in most cases, what guided the model to make the prediction of the output that you look at and that can be very unsatisfactory for biology, interested in mechanism. It can be very unsatisfactory for physicians interested in understanding the underlying disease mechanism.


It can be unsatisfactory for biotech and drug discoveries that want to understand how drugs act and what maybe underlies meaningful structural features. So with Felix, we decided it'd be interesting if you could open up the box. So could you look inside the model to see what was being learned? We are able to open up, in this case actually, we primarily focused on graph neural nets. We now have a new piece we're just about to submit on transformers, but in this case, we could open up and look to see what were the rationales, what were the chemical substructures that the model was pointing to in each compound that was leading to the high prediction that it could make for a good antibiotic and these rationales we then used as hooks, I should notably say, that we were able to identify the rationales from these large collections using algorithms that would develop by DeepMind as part of their AlphaGo program.


So AlphaGo was developed by DeepMind as a deep learning platform to play and win go the ancient Asian board game and we used similar approaches called Monte Carlo Tree Search that allowed us to identify these rationales that we effectively then used as hooks and kind of organizing hooks on screens where you can envision or appreciate that most exposed screens give you one-offs. This molecule does what you want and silico screens are similarly designed with these rationales. We could use them as organizing hooks to say, ah, these compounds that are identified as making for very good antibiotics all have the same substructure and thus they likely in the same class and act in similar mechanism and this led us to identify five novel classes, one of which we highlighted in this piece that acts very effectively against MRSA, so methicillin-resistant staph aureus you alluded, which is probably the most famous of the antibiotic resistant pathogens that we even outside infectious are quite familiar with. It be devil's athletes, so NFL players are often hit with MRSA, whether from scraping their limbs on AstroTurf or from actually surgeries to say, for example, correct something at their knee. This new class had great efficacy in animal models, again, acting through a new mechanism.

Eric Topol (16:12):

Will you bring that forward like halicin through this same entity?

Jim Collins (16:17):

Yes. We've now provided the molecules to Phare Bio and they're digging in to see which of these might be the most exciting and interesting to advance clinically.

Eric Topol (16:26):

I mean, it's amazing because this area is so neglected. Maybe you can help explain, since we're talking about existential threats as we get more and more resistant antibiotics and the biopharma industry is basically not into this and it relies on the work that you've been doing perhaps or other groups, I don't know of any that are doing more than you. I mean, it's incredible to me. Is it just because of the financial aspects that there's no business in the life science industry?

Jim Collins (17:03):

It's an interesting challenge. So I've thought about it. I really haven't come up with a great solution yet, but I think you've got multiple factors at play. One is that I think all of us, every one of your listeners has lost someone to a bacterial infection, but in most cases you don't realize you lost them to a bacterial infection. It might be that your elderly relative went into the hospital with a condition but acquired hospital-based infection and died subsequently from that and happened quite quickly. Another cases, again, it's secondary. Notably, during the pandemic, one out of seven individuals hospitalized for Covid had a bacterial infection and 50% of those who died had a bacterial co-infection. And noted by going back to the Spanish flu of over a hundred years ago. It was as deadly as it was because we didn't have antibiotics and most of the folks that died had a bacterial co-infection.


So you have this in the backdrop, you then have that, nobody's kind of gotten behind it, so we don't have any major foundation addressing antibiotic resistance. There are no charity walks, there are no charity runs, there is no month, there is no color, there are no ribbons, there are no celebrity behind it, there's just not known so it hasn't captured the public's imagination. AThen you come with that, this backdrop of the broken market where I said shared, it's really expensive to develop a new antibiotic, but if you develop a new antibiotic, the tendency now will be to shelve it until it's desperate so now even the young companies that had developed and gotten an antibiotic through to approval often went bankrupt because the model, the market couldn't provide them with revenue to go after the next one or sustain their efforts. And so you have pharma biotech jumping out. I think we need two-pronged effort going forward. I do think we need nation states to come forward and get behind this, and I think we increasingly need philanthropists to come forward and go after it. As I share your term of existential threat, I think if you speak with most educated individuals, antibiotic resistance broadly, antimicrobial resistance will be on everyone's existential threat list but notably of that list, it's the cheapest one that can be solved.

Eric Topol (19:09):

Well, you're showing that you've got the most extraordinary candidates that have been found in decades. So that says a lot right there.

Jim Collins (19:18):

Important step, yeah. So I think we've got additional innovation needed in the models to address this, and until we have that address, then this interesting discoveries we and others are making will not get to patients. So we need to have that additional next step to close this gap.

Eric Topol (19:32):

Now, obviously this has relied on AI and the progress that's occurring in AI to enable some of your work. I am fascinated by the use of AlphaGo. Most times we hear about using AlphaFold2, but you actually use AlphaGo the original game DeepMind work but there also was the progress of from deep neural networks to transformer models and your ability now to basically exemplify what can be achieved in drug discovery using the progress in multimodal AI. Is this something that is making a difference for you and your group?

Jim Collins (20:13):

It is, it’s huge. I think it's very early in terms of the introduction to these new tools extensively within drug discovery. Machine learning has been used for over two decades, both supervised learning and unsupervised learning. Now we're seeing groups coming in for the deep learning efforts. It's largely data-driven so in fact, with the exception of sequences, most of drug discoveries not yet big data in the big data phase, but it's beginning to change. It's truly been transformative for us, so we've used graph neural nets primarily for our discovery efforts. We're now beginning to incorporate language models as multimodal models along with the graph neural nets as well as to see to what extent pre-trained language models. For example, mobile form from IBM, which was trained on PubChem and the ZINC database could be fine-tuned with small amounts of training data, screening data from a resistant organism.


Third, and I made an indirect allusion already, we've been looking at using transformers and genetic algorithms in older form of AI tech for design of novel antibiotics so we've been now looking to see using fragments as a starting base, using trained models to build out novel antibiotics that can then be de novo designed. One of the big challenges in that space is how do you synthesize these molecules? So you have both the challenge of can you come up with a small number of steps that enable you to synthesize? And second is could you find somebody to synthesize them? And each of those remains very big challenges. My faculty colleague here at MIT, Connor Coley's probably one of the world leaders, easily, he's in using AI to calculate the synthesized ability of a molecule, but we still have gaps in that we don't have the community resources to make most of what we come up with.

Eric Topol (21:58):

Well, one of the features of large language models that David Baker at the Protein Design Institute exploited is its ability to hallucinate and come up with proteins that don't exist. Can you do the same thing in your design of antibiotic candidate molecules in a way that is not worrying about the synthesis, but just basically the hallucinatory behavior of large language models?

Jim Collins (22:28):

It's interesting, so yes and so David's work is marvelous and we're big fans and longtime friends of his work. Yes, so we've been driving these models truly to do de novo synthesis. So based on what has been learned, can you put together molecules that one's never seen before? We're doing it quite successfully. It becomes interesting from the hallucination in that it comes out really more of these models making stuff up and ours it's really more directing the hallucinations, right? Really looking to see can we harness the imagination of the models in order to move them forward in very creative design manners.

Eric Topol (23:08):

Yeah, I mean, I think most people have a negative connotation of hallucinations, but these are the smart variety potentially. This in many ways you could say there's so much crowded interest in the drug discovery AI world, but what you're doing now seems to be setting the pace in many respects for others to follow such remarkable advances in a short time. By the way, we'll link to that TED talk you gave in April 2020, where in seven minutes you went over what you're doing of course and who would've, and that was in 2020 that where you'd be three or four years later, and that was what you're going to do over the next seven years with seven new classes of new antibiotics. Now, before we wrap up, it isn't just that you're an AI antibiotic, you and your team antibiotic discover and doing compressing in time, what has taken decades that you're doing in months, but also you are a father of figure in the field of synthetic biology and I wonder if you, before we wrap up, can explain not only what synthetic biology is since a lot of people don't really know what that means, but how does that dovetail with your efforts in what we've been discussing?

Jim Collins (24:33):

Yeah, thanks. So synthetic biology is a relatively new field that's bringing together engineers with biologists to use engineering principles to model design and build synthetic gene networks and other molecular components that can be used to rewire and reprogram living cells and cell-free systems, endowing them with novel functions of a variety of applications. So the circuits, these programmable cells are impacting broad swats of the economy from food and water to health and sustainability of bioenergy to human health. Our focus is primarily human health and we've been advancing the idea that you can reprogram bacteria to detect and treat bacterial infections. So we've shown you can use this to go after cholera, we've shown you can use is to prevent antibiotic induced gut dysbiosis. We've also used synthetic biology to create whole new classes of diagnostics. For example, paper-based ones using RNA sensors for Ebola, for Zika and for Covid.


How it dovetails with what we talked about is that I think there's a great opportunity now to turn to AI to expand synthetic biology, both expanding the number of parts we have to re-engineer living systems as well as to better infer design principles that can be used to reprogram rewire living systems. We're beginning to advance, we're not yet at the SynBio AI project phase, but very early efforts and David's dominating the protein space and we and others are beginning to now movement to the RNA space. So to what extent can we create large libraries of RNA components, train language-based models, structure-based models that can both predict RNA structure more critically predict RNA function and as you know from your marvelous work and what's happening is that it's the exciting age of RNA of getting after RNA therapeutics, be it mRNA or CRISPR related and we still need to get better at our ability to design those therapeutics with certain functions in mind, and we think AI is going to help get us there faster.

Eric Topol (26:34):

Well, speaking of that, there was a paper this week in Cell by McCafferty and colleagues, and one of the sentences that struck me, we are standing on the cusp of a new era of biology, where the integration of multimodal structural datasets with multiscale physics-based simulation will enable the development of visible, virtual cells. This is yet another lineage or direction of where we're headed with AI, but this fusion of the advances that are occurring right now in biology with AI that extend in many different directions, it's so exciting and you are basically nailing it. I mean, you're putting points on the board, Jim, and I just have to say, I'm blown away by what you've been accomplishing in a time space that's so incredibly compressed.

Jim Collins (27:40):

Oh, well thanks. Well, you think back to the early days of molecular biology and physicists like Francis Crick and Max Delbrück played huge pioneering roles and then in the second wave in the eighties or so, you had other physicists like Walter Gilbert playing big roles. I do think physicists computer scientists are starting now to play big roles in this next phase where we need tools like AI in order to really grapple with and harness the complexity, both the biology and the chemistry that underlies living cells. They can kind of expand our intuitions both to understand and to really control these systems for good going forward.

Eric Topol (28:15):

Well, you're doing it and we're be cheering for the success of these drugs that you've come up with in the clinical trials as they go forward because they look so remarkably promising. You even highlighted ways that I wouldn't have envisioned where they could make a difference, so we'll follow your work, you and your colleagues with great interest. Thanks so much for joining,

Jim Collins (28:37):

Eric, thanks for having me. Enjoyed our conversation.


Thanks for listening to Ground Truths. Please share if you found this podcast informative.

Full video interview will post here