14.6 C
Friday, September 22, 2023
HomeScienceA novel approach that accelerates drug discovery is introduced by the new...

A novel approach that accelerates drug discovery is introduced by the new model.


Target drug interaction metrics display highly variable levels of coverage. Coverage is defined as the proportion of drugs or targets for which there is a data point (positive or negative) in that dataset. High versus low coverage criteria tend to reward different types of model performance. (a) In this cartoon of an example of a low-coverage data set, drug candidates cover the entire diversity of space, and no two drugs are significantly alike. A successful model can learn a rough estimation of the fitness landscape, but must accurately model a significant portion of the drug space to generalize to all candidates. (b) For high-coverage datasets, drugs tend to target a specific protein family. Thus, a successful model does not need to generalize nearly as widely but should be able to capture more subtle differences in drug fitness to achieve high specificity and discrimination among similar drugs. (c) In a review of existing common benchmark datasets for DTI, we found widely varying coverage, from datasets with nearly zero coverage (each drug/target is represented only a few times) to nearly complete coverage (all drug pairs by target are known in the data). credit: Proceedings of the National Academy of Sciences (2023). DOI: 10.1073/pnas.2220778120

Huge libraries of drug compounds may hold potential treatments for a variety of diseases, such as cancer or heart disease. Ideally, scientists would like to experimentally test each of these compounds against all possible targets, but doing this kind of screen is time consuming.

In recent years, researchers have begun using computational methods to screen those libraries in hopes of speeding up drug discovery. However, many of these methods are also very time consuming, with most calculating the 3D structure of each target protein from its amino acid sequence, and then using those structures to predict which drug molecules it will interact with.

Researchers at MIT and Tufts University have now created an alternative computational approach based on a type of artificial intelligence algorithm known as a large language model. These models—one well-known example is ChatGPT—can parse huge amounts of text and figure out which words (or, in this case, amino acids) are most likely to appear together. The new model, known as ConPLex, can match target proteins to potential drug molecules without the need for an extensive computational step to calculate the molecules’ structures.

With this method, researchers can screen more than 100 million compounds in a single day — far more than any model in existence.

“This work addresses the need for efficient and accurate in silico screening of potential drug candidates, and the scalability of the model allows large-scale screens to assess off-target effects, drug repurposing, and quantify the effect of mutations on drug binding,” says Bonnie Berger, Simmons Professor of Mathematics and group leader. computing and biology at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), and a senior author on the new study.

Lenore Quinn, a professor of computer science at Tufts University, is also lead author of the paper, which appears this week in Proceedings of the National Academy of Sciences. Rohit Singh, a CSAIL research scientist, and Samuel Slidzewski, a graduate student at MIT, are the paper’s lead authors, and Brian Bryson, associate professor of biological engineering at MIT and Ragon Institute member at MGH, MIT and Harvard, is also an author. In addition to the paper, the researchers made their model available online for other scientists to use.


In recent years, computational scientists have made significant progress in developing models that can predict the structures of proteins based on their amino acid sequences. However, using these models to predict how a large library of potential drugs will interact with, for example, an oncogenic protein has proven challenging, mainly because calculating the 3D structures of proteins requires significant amounts of time and computing power.

An additional hurdle is that these types of models do not have a good track record of eliminating compounds known as decoys, which look very similar to a successful drug but do not interact well with the target.

“One of the long-standing challenges in this field has been that these approaches are fragile, in the sense that if I give the model a drug or a small molecule that looks almost like the real thing, but is slightly different in a subtle way, the model may still predict that they will interact, even though it doesn’t. It should,” Singh says.

Researchers have designed models that can overcome this kind of fragility, but they are usually designed for only one class of drug molecules, and they are not well-suited for large screens because the calculations are time-consuming.

The MIT team decided to take an alternative approach, based on a protein model first developed in 2019. Working with a database of more than 20,000 proteins, the language model encodes this information into meaningful numerical representations of each chain of amino acids that Captures the associations between sequence and structure.

“Using these language models, even proteins with very different sequences but potentially similar structures or similar functions can be represented in a similar way in this language space, and we are able to take advantage of that to make our predictions,” he says.

In their new study, the researchers applied the protein model to the task of figuring out which protein sequences will interact with specific drug molecules, both of which have numerical representations that are converted into common space and shared by a neural network. They trained the network on known drug-protein interactions, allowing it to learn to associate specific features of the proteins with drug-binding ability, without having to calculate the 3D structure of any of the molecules.

“With such a high-quality numerical representation, the model can completely short-chain the atomic representation, and from those numbers predict whether or not this drug will bind,” Singh says. “The advantage of this is that you avoid having to go through the atomic representation, but the numbers still contain all the information you need.”

Another advantage of this approach is that it takes into account the flexibility of protein structures, which can be ‘wobbly’ and take on slightly different shapes when interacting with a drug molecule.

intense attraction

To reduce the possibility of their model being fooled by decoy drug molecules, the researchers also incorporated a training phase based on the concept of contrast learning. Under this approach, researchers provide typical examples of “real” drugs and fraudsters and teach them to differentiate between them.

The researchers then tested their model by screening a library of about 4,700 candidate drug molecules for their ability to bind to a group of 51 enzymes known as protein kinases.

From the best results, the researchers selected 19 drug-protein pairs to test experimentally. Experiments revealed that, of the 19 outcomes, 12 had strong binding affinity (in the nanomolar range), while all other potential drug-protein pairs would have no affinity. Four of these pairs are bound with extremely high, near-nanomolar affinity (so strong that even a small drug concentration, on the order of parts per billion, will immobilize the protein).

While the researchers focused primarily on screening small-molecule drugs in this study, they are now working to apply this approach to other types of drugs, such as therapeutic antibodies. This type of modeling may also be useful for running toxicity screens for potential drug compounds, to ensure that there are no unwanted side effects before testing them in animal models.

“Part of the reason discovery drugs are so expensive is because they have such high failure rates. If we can reduce those failure rates by saying upfront that this drug is unlikely to work, that could go a long way in lowering the cost of the drug,” Singh says.

says Eitan Rubin, director of the Cancer Data Science Laboratory at the National Cancer Institute, who was not involved in the study. “For example, incorporating structural information into the latent space or exploring molecular methods for generating decoys may further improve predictions.”

more information:
Rohit Singh et al., Variational learning in protein language space predicts interactions between drugs and protein targets, Proceedings of the National Academy of Sciences (2023). DOI: 10.1073/pnas.2220778120

Provided by the Massachusetts Institute of Technology

This story is republished with permission from MIT News (web.mit.edu/newsoffice/), a popular site covering news related to research, innovation, and teaching at MIT.

the quote: New Paradigm Offers Way to Accelerate Drug Discovery (2023, June 8) Retrieved June 8, 2023 from https://phys.org/news/2023-06-drug-discovery.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.

The author of what'snew2day.com is dedicated to keeping you up-to-date on the latest news and information.

Latest stories