June 25th, 2024

ESM3, EsmGFP, and EvolutionaryScale

EvolutionaryScale introduces ESM3, a language model simulating 500 million years of evolution. ESM3 designs proteins with atomic precision, including esmGFP, a novel fluorescent protein, showcasing its potential for innovative protein engineering.

Read original articleLink Icon
ESM3, EsmGFP, and EvolutionaryScale

EvolutionaryScale introduces ESM3, a language model for biology that simulates 500 million years of evolution. ESM3 is a generative model trained on diverse protein data from Earth's natural environments. It operates by reasoning over protein sequence, structure, and function simultaneously, allowing for the generation of new proteins with specific properties. The model's capabilities improve with scale, enabling it to design proteins with atomic-level accuracy and solve challenging tasks in protein engineering. ESM3's ability to generate proteins like esmGFP, a novel fluorescent protein, showcases its potential for creating functional proteins outside the realm of natural evolution. By leveraging machine learning techniques, ESM3 expands the search for protein variants beyond what traditional methods can achieve. The model's unique approach to protein design offers insights into evolutionary processes and the vast potential for programming biology through advanced AI technologies.

Related

SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code

SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code

SceneCraft is an advanced Large Language Model (LLM) Agent converting text to 3D scenes in Blender. It excels in spatial planning, asset arrangement, and scene refinement, surpassing other LLM agents in performance and human feedback.

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.

Synthesizer for Thought

Synthesizer for Thought

The article delves into synthesizers evolving as tools for music creation through mathematical understanding of sound, enabling new genres. It explores interfaces for music interaction and proposes innovative language models for text analysis and concept representation, aiming to enhance creative processes.

Are AlphaFold's new results a miracle?

Are AlphaFold's new results a miracle?

AlphaFold 3 by DeepMind excels in predicting molecule-protein binding, surpassing AutoDock Vina. Concerns about data redundancy, generalization, and molecular interaction understanding prompt scrutiny for drug discovery reliability.

Researchers run high-performing LLM on the energy needed to power a lightbulb

Researchers run high-performing LLM on the energy needed to power a lightbulb

Researchers at UC Santa Cruz developed an energy-efficient method for large language models. By using custom hardware and ternary numbers, they achieved high performance with minimal power consumption, potentially revolutionizing model power efficiency.

Link Icon 6 comments
By @nxobject - 5 months
It looks like "500M of evolution" isn't a description (however indirect) of an iterative process, but a metric that measures differences in results:

> But in order for ESM3 to solve its training task of predicting the next masked token the model must learn how evolution moves through the space of potential proteins. In this sense, ESM3 can be thought of as an evolutionary simulator. A traditional evolutionary analysis of the ancestry of esmGFP is paradoxical as the protein was created outside natural processes, but still we can draw insight from the tools of evolutionary biology on the amount of time it would take for a protein to diverge from its closest sequence neighbor through natural evolution. We find naturally occuring GFPs with similar levels of sequence identity are separated by hundreds of millions of years of evolution. Using an analysis similar to one might perform on a new protein found in the natural world, we estimate that esmGFP represents an equivalent of over 500 million years of natural evolution performed by an evolutionary simulator.

By @vessenes - 5 months
This is just so cool. I wonder if they’ll release the large (98b) model or gatekeep it in some way. Something that can generate a novel fluorescing sequence is amazingly cool; I bet there’s a lot of interesting work to do from here both on model tuning, preference training and also biology research side.
By @ranjanav22 - 5 months
This is incredibly exciting! It's such a relief from all the dystopian views about how our planet will crumble. I guess if people have unknowingly created problems for themselves, they are also capable of solving them!(talking about plastics here)
By @mrtesthah - 5 months
This is the sort of thing that could optimize proteins in multiple species across an entire ecosystem to accelerate adaptation to climate change. I am not an expert in this area but it seems so promising.
By @mariuswiggert - 5 months
Great to share this with the world!