August 12th, 2024

Creating the largest protein-protein interaction dataset in the world

A-Alpha Bio, founded in 2017, is creating a large protein-protein interaction dataset using its AlphaSeq method, which employs genetically engineered yeast cells to measure interactions, enhancing drug development and biotechnology.

Read original articleLink Icon
Creating the largest protein-protein interaction dataset in the world

A-Alpha Bio, a biotech startup founded in 2017 by David Younger and Randolph Lopez, aims to create the largest protein-protein interaction (PPI) dataset globally through its innovative method called AlphaSeq. Traditional methods for studying PPIs have been limited by small datasets, with the largest existing dataset containing only 3,176 proteins. AlphaSeq leverages the natural mating process of yeast cells, genetically engineering them to display proteins on their surfaces. By mixing two types of yeast cells, each displaying different proteins, the likelihood of mating becomes a function of the strength of the protein interactions. This process is tracked using DNA barcodes, allowing researchers to quantify interactions based on the frequency of barcode pairings. AlphaSeq has demonstrated the ability to generate thousands of PPI affinity values in a single run, significantly outpacing existing methods. The startup's approach is positioned as a more scalable and accurate alternative to traditional techniques like yeast two-hybrid and yeast display, which face limitations in accuracy and scalability. A-Alpha Bio's advancements could enhance the understanding of protein interactions, which are crucial for drug development and other biotechnological applications.

- A-Alpha Bio is developing the largest protein-protein interaction dataset using its AlphaSeq method.

- AlphaSeq utilizes genetically engineered yeast cells to measure protein interactions through mating efficiency.

- The method can generate thousands of PPI affinity values in a single experimental run.

- A-Alpha Bio's approach offers advantages over traditional methods like yeast two-hybrid and yeast display in terms of scalability and accuracy.

- The startup's innovations could significantly impact drug development and biotechnological research.

Related

Are AlphaFold's new results a miracle?

Are AlphaFold's new results a miracle?

AlphaFold 3 by DeepMind excels in predicting molecule-protein binding, surpassing AutoDock Vina. Concerns about data redundancy, generalization, and molecular interaction understanding prompt scrutiny for drug discovery reliability.

How AI Revolutionized Protein Science, but Didn't End It

How AI Revolutionized Protein Science, but Didn't End It

Artificial intelligence, exemplified by AlphaFold2 and AlphaFold3, revolutionized protein science by accurately predicting protein structures. Despite advancements, AI complements rather than replaces biological experiments, highlighting the complexity of simulating protein dynamics.

AI Revolutionized Protein Science, but Didn't End It

AI Revolutionized Protein Science, but Didn't End It

Artificial intelligence, exemplified by AlphaFold2 and its successor AlphaFold3, revolutionized protein science by predicting structures accurately. AI complements but doesn't replace traditional methods, emphasizing collaboration for deeper insights.

Ex-Meta scientists debut gigantic AI protein design model

Ex-Meta scientists debut gigantic AI protein design model

EvolutionaryScale introduces ESM3, a powerful AI protein design model trained on billions of sequences. Secured $142 million funding for drug development. Addresses concerns about AI-designed proteins. Researchers anticipate its impact.

Accurate structure prediction of biomolecular interactions with AlphaFold 3

Accurate structure prediction of biomolecular interactions with AlphaFold 3

AlphaFold 3 introduces a diffusion-based architecture that enhances biomolecular interaction predictions, outperforming existing tools and improving data efficiency, with potential implications for drug discovery and structural biology.

Link Icon 4 comments
By @hirenj - 8 months
The PTM situation is a bit worse actually. First, of all the PTMs, high mannose N-glycans can be recapitulated (with the right knockouts). It’s the complex/hybrid that are completely missing.

Second, the O-glycans are completely different to humans. Unless you’re looking at alpha-DG, and a handful of other proteins, you’re going to get the wrong glycosylation. This is a problem for two reasons: a) the alpha-Mannose does completely different things to the protein backbone compared to alpha-GalNAc, or probably alpha-Fucose etc, and b) those yeast PMT enzymes don’t seem to care where they throw sugars on, so they’re going to probably glycosylate something that shouldn’t be glycosylated.

This is to say nothing about the different suite of PC-processing enzymes and zymogen activation in yeast too.

So here’s my free solution to solve this: On all mated cells, do a ConA enrichment and identify where there is O-glycosylation (mass spectrometry). If it’s on your target protein, drop the data?

But otherwise, if you are interested in yeast interaction, looks like a cool technique!

By @jszymborski - 8 months
This is my field (Im a PhD student who writes PPI inference models).

I skimmed the article and start-up website and I'm a bit confused.

PPI inference is not binding affinity prediction is not binding site prediction, despite being related tasks.

There are billions of PPI pairs in public datasets, there is much less binding affinity data, and even less binding site data.

(Side-note: if you're hiring PPI / deep learning / comp. bio people, send me an email at the address in my bio.)

By @celltalk - 8 months
My gut says protein-protein interactions aren’t that useful given how these interactions scale, and on top of that you have post-translational modifications, SNVs etc. It’s a very very hard problem to solve.

Instead, we can focus on gene-gene interactions and go bottom up. There, we don’t need new wetlab techniques that needs to be validated, measure mRNA instead. Plus, an average single cell contains 40 million proteins, yet the number of mRNA molecules are orders of magnitudes less and can be sequenced with high precision.

If you for instance open KEGG database in graph mode, you will see one of the largest manually curated datasets ever. Yet it is still tiny! If you imagine A,T,G,C as alphabet, and genes as special tokens. All we know as humanity is couple of words… and it’s sad.

I think LLMs might be our best bet on these. Given few words they might uncover “new words” we have never thought about. I kinda tried this… but the methodology is still shaky.

https://celvox.co/blog/TCC/index.html

By @michelb - 8 months
I think I lost count of how many companies are currently building this. I'm not in this field, but are they all very different or just trying to be the first to win?