June 24th, 2024

Are AlphaFold's new results a miracle?

AlphaFold 3 by DeepMind excels in predicting molecule-protein binding, surpassing AutoDock Vina. Concerns about data redundancy, generalization, and molecular interaction understanding prompt scrutiny for drug discovery reliability.

Read original articleLink Icon
Are AlphaFold's new results a miracle?

AlphaFold 3, developed by DeepMind, has shown promising results in predicting how drug-like molecules bind to target proteins, outperforming AutoDock Vina. The new model can predict binding even without the 3D structure of the target protein. However, skepticism arises due to concerns about data redundancy in scientific datasets and the model's ability to generalize rather than memorize. The analysis reveals high sequence identity between proteins in the training and test datasets, raising questions about the model's learning capabilities. Additionally, issues with producing overlapping atoms in predictions suggest potential limitations in understanding molecular interactions. The need for AlphaFold 3 to demonstrate non-obvious insights beyond data memorization is highlighted to ensure its reliability in drug discovery applications. Further research and scrutiny are essential to determine the true extent of AlphaFold 3's capabilities and limitations in the field of molecular docking.

Related

Video annotator: a framework for efficiently building video classifiers

Video annotator: a framework for efficiently building video classifiers

The Netflix Technology Blog presents the Video Annotator (VA) framework for efficient video classifier creation. VA integrates vision-language models, active learning, and user validation, outperforming baseline methods with an 8.3 point Average Precision improvement.

Generating audio for video

Generating audio for video

Google DeepMind introduces V2A technology for video soundtracks, enhancing silent videos with synchronized audio. The system allows users to guide sound creation, aligning audio closely with visuals for realistic outputs. Ongoing research addresses challenges like maintaining audio quality and improving lip synchronization. DeepMind prioritizes responsible AI development, incorporating diverse perspectives and planning safety assessments before wider public access.

Testing Generative AI for Circuit Board Design

Testing Generative AI for Circuit Board Design

A study tested Large Language Models (LLMs) like GPT-4o, Claude 3 Opus, and Gemini 1.5 for circuit board design tasks. Results showed varied performance, with Claude 3 Opus excelling in specific questions, while others struggled with complexity. Gemini 1.5 showed promise in parsing datasheet information accurately. The study emphasized the potential and limitations of using AI models in circuit board design.

Lessons About the Human Mind from Artificial Intelligence

Lessons About the Human Mind from Artificial Intelligence

In 2022, a Google engineer claimed AI chatbot LaMDA was self-aware, but further scrutiny revealed it mimicked human-like responses without true understanding. This incident underscores AI limitations in comprehension and originality.

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.

Link Icon 8 comments
By @nuz - 7 months
I kind of respect deepmind for simply keeping their nose to the ground and doing good work like this without overhyping it too much. More of the under promise over deliver engineering style than some of their competitors tends to do it
By @dekhn - 7 months
No, it's not a miracle; everything it does works because the information to make those predictions is a collection of latent variables and DM found good ways to convert from sequence space into an embedding that approximates those latent variables.

From what I can tell it still depends heavily on having a good sequence and structure template (or templates). It tells us little to nothing about the specific details of the folding process. To me the only part that seems miraculous is that it seems like we can predict novel structures (previously unknown conformations) using small fragments of templates rather than entire protein domains.

By @flobosg - 7 months
> Different proteins can also be related to each other. Even when the sequence similarity between two proteins is low, because of evolutionary pressures, this similarity tends to be concentrated where it matters, which is the binding site.

It’s a small nitpick, but I think that the author actually meant “sequence identity” here, because his statement would make much more sense then. Sequence similarity is physicochemical in nature, and tends to be concentrated, in addition to functionally relevant sites (usually ligand-binding residues, as he mentioned), at key structural regions of the protein such as the hydrophobic core (where a high frequency of similarly hydrophobic residues is expected).

This is one of the reasons why proteins from the same family can share highly similar structures while having very low sequence identity, with highly conserved motifs (where the sequence identity is concentrated) taking care of the functionality.

By @epups - 7 months
The author is making the point that Alphafold 3 is not so impressive - it is simply regurgitating its train set, and it's not so good for inference.

I think his central point is fair and interesting. The test train split is apparently legit, as they used structures released before 2021 for training and the rest for testing. However, there was no real check for duplicates, and the success rate might be inflated by a bunch of "me too", low hanging fruit structures that are very slight variations from what we know.

However, I'm not sure I agree with his skepticism. LLMs suffer from the exact same problems - getting it to write a Snake game in any language is trivial, but it is almost certainly regurgitating - , but can be useful as well. I mean, if for various reasons people are publishing very similar structures out there, there's certainly value in speeding up or reducing that work considerably.

By @seeknotfind - 7 months
Why overlapping molecules would indicate memorizing or over fitting is beyond me. Imagine a mechanic designing linkages. They may collide, but if they could pass through each other, they could work. Then they might reconfigure them. Similarly, overlapping molecules could be a step along the way to understanding if the algorithm is focused on binding structures rather than global physical structures.
By @maremmano - 7 months
I'm totally clueless about the topic (protein folding), but this stuff is very interesting. From the article, it seems that AlphaFold 3 is just a biochem version of GPT or what? From what I've heard, the older AlphaFolds had some special tricks for protein prediction. Am I missing something?