The application of artificial intelligence (AI) to biotechnology has made possible the ability to create wholly original proteins from scratch.
AI has begun to be used in more complex fields like biotechnology, creating proteins from raw genetic data. AI is making this possible, opening up new possibilities for the field.
Scientists from the University of California – San Francisco (UCSF) have examined an AI tool called ProGen capable of generating artificial enzymes from scratch.
According to the study report, published in Nature Biotechnology, some of these artificially created enzymes performed as well as those found in nature in laboratory experiments, even though their artificially created amino acid sequences deviated significantly from any known natural protein.
The experiment shows that natural language processing, although being designed to read and create linguistic text, can acquire at least some biological principles. ProGen is an AI algorithm built by Salesforce Research that employs next-token prediction to assemble amino acid sequences into artificial proteins.
“The artificial designs function significantly better than designs that were influenced by the evolutionary process,” said James Fraser, Ph.D., professor of bioengineering and medicinal sciences at the UCSF School of Pharmacy and an author of the study, published Jan. 26 in Nature Biotechnology (Madani et al., 2023).
“The language model is learning components of evolution, but it’s different than the typical evolutionary process,” Fraser said. “We can now adjust the creation of these traits for specific results: a thermostable, acidic, or protein-insensitive enzyme.
To build the model, the research team fed the amino acid sequences of 280 million distinct types of proteins to a machine learning algorithm and let it digest the data for two weeks. Then, the model was fine-tuned by priming it with 56,000 sequences from five lysozyme families and contextual information about these proteins.
The model rapidly created a million sequences, and the study team chose 100 sequences to test, based on how closely they matched actual protein sequences and how naturalistic the AI proteins’ amino acids “grammar” and “semantics” were.
Out of the first group of 100 proteins tested in vitro by Tierra Biosciences, the team chose five artificial proteins to test in cells and compared their activity to that of hen egg white lysozyme, an enzyme found in the whites of chicken eggs (HEWL).
Two of the artificial enzymes broke down bacterium cell walls, although their sequences were only 18% similar. Both sequences were 90% and 70% identical to any proteins.
In a second test, the scientists found that AI-generated enzymes showed activity even though only 31.4% of their sequence resembled any known natural protein. One mutation in a natural protein can stop it from functioning.
By reviewing the raw sequence data, the AI could determine how the enzymes should be constructed. The atomic structures of the manufactured proteins were measured using X-ray crystallography and appeared just as they should, despite the fact that the sequences were unlike anything seen previously.
Salesforce Research created ProGen in 2020, based on natural language programming built by its researchers to generate English language writing.
Nikhil Naik, Ph.D., Director of AI Research at Salesforce Research and the paper’s senior author, said, “When you train sequence-based models with a lot of data, they are very good at learning structure and rules.” “They learn what words can go together and how words are put together,”
Using AI, the choices of protein design are almost limitless. Lysozyme, for example, is one of the simplest proteins having 300 amino acids. Since there are 20 different amino acids, there are 20300 different ways to put them together.
Given the model’s infinite possibilities, it’s astonishing that it can generate functional enzymes so easily.
Ali Madani, Ph.D., founder of Profluent Bio, a former researcher at Salesforce Research, and the paper’s first author, said that the ability to make functional proteins from scratch shows that we are entering a new era of protein design. “This is a new, flexible tool for protein engineers, and we’re excited to see how it can be used in medicine.”
AI technology has opened up the possibility of generating wholly original proteins from scratch. While this technology promises many potential benefits for biotechnology, there also needs to be increased focus on regulation and oversight in order to ensure safe and responsible use of the technology.