Biohub releases a world model of protein biology

The accompanying preprint is interesting: https://www.biorxiv.org/content/10.64898/2026.06.03.729735v1

Modeling protein-protein binding is still a massively unsolved problem, mainly because we don't really have the data. Alphafold2 was great but didn't actually 'solve' protein-folding as all input data is from single 'state' X-ray crystallography of the proteins, not 'really' how these proteins behave in the wild. So it's still very, very had to predict what binds to what, which of course is a multi-billion-dollar industry.

I work in a pharma-field and I wish we could easily design molecular binders. We still spend millions every year finding targets that could 'smuggle' our drugs into cells.

Some other players in this field are Boltz Lab and Isomorphic Labs (the Alphafold Google spinoff led by Hasabi). None of them can predict anything complex or 'big', everything is peptide-level. OP's work is another step towards something better.

The most interesting part in the preprint is that they find no matches for their designed binders in the world-write protein database. An open question with protein-designers is whether they just regurgitate training material, which is far easier to test with English-language models.

It's interesting that there are almost no comments on this. This feels like some of the most exciting and impactful fields of the next years. I worked with a cracked researcher that was generating molecules a couple of years ago. She spent most of her time fighting cuda bugs and trying installing packages. I wonder if the ecosystem matured right now. There are people studying cells to see what enters and what exits and engineer how to stop, for example, resources feeding a bad cell. Possibilities feel endless. I am a little worried about side effects, since bio is way more chaotic than silicon, but hopefully AI will help with that level of chaos too.

we interviewed Alex Rives, cofounder of EvoScale and Head of Science at BioHub - here https://www.latent.space/p/esmfold2

also 3 paper coauthors walked thru it with us: https://youtu.be/4g1bURdKN0Q

all this is part of the new AI for Science effort we are spinning up at Latent Space - all guidance and support would be greatly appreciated as this is a much harder domain to cover than software

It is a nice work, however the domain specific finetuning will always be of higher accuracy prediction. Another thing worth noting is the sequence length used for the training (usually cut to 1024/2048) which is a game changer if left uncut.

I did have a bit of fun myself finetuning esm2 in domain specific bacteria (cause it gives better score) and comparing it to another model (self created) and self created beat it at 25% more accuracy. Then for the 3d structure was coded a 3d protein visualizer hypergraph with the upload file option and visualize instantly the result. 2 days job :)

A similar work is Foundry (https://github.com/RosettaCommons/foundry). While both of them are good, the main issue is that it is not accurate enough at atom level. There are good chances that predicted or designed active site is slightly different from the real structure solved by X-ray, NMR or cryo-electronic microscopy. A side-chain or two may turn the other way so that it changes how the interaction is interpreted. So the tools are good and convenient now. But the design or prediction is often hit-and-miss.

Our mission is to cure or prevent all disease

Okay, now you have my attention.

What's the deal on the company behind it? “Biohub is a 501(c)(3) biomedical research organization...” Nonprofit. Nifty!

This all sounds great, but as we have recently seen with, say OpenAI, there is nonprofit and then there is nonprofit. Anyone know which Biohub is?

Incredible, but also scary if you think about what it may be lowering the barrier of entry to…

> This model is released under the MIT License.

Huh, appears to be actually open source, that's a pleasant surprise. Usually these academic models have some weird license attached to them.

  a scientific engine for prediction, design, and discovery that can map proteins across the tree of life, predict their structures, and design new protein binders that function in laboratory experiments.

So, my issue with this is just like in a lot of the other areas of bio we're not able to explore outside the semantics of what is "known." Even a simpler task of just doing proper assembly is plagued by this. De Novo assembly of an alien/novel organism mixed with samples from other alien organisms would be impossible with what we can do today. Even with things that we're familiar we struggle with metagenomic assembly.

we interviewed Alex Rives, cofounder of EvoScale and Head of Science at BioHub - here https://www.latent.space/p/esmfold2

also 3 paper coauthors walked thru it with us: https://youtu.be/4g1bURdKN0Q

all this is part of the new AI for Science effort we are spinning up at Latent Space - all guidance and support would be greatly appreciated as this is a much harder domain to cover than software

This was posted on a Saturday night (in the US). A story posted at lunch time on a Tuesday is going to get 100x or maybe 10,000x more views than a story posted on a Saturday night.

It's not that HN readers lack intellectual curiosity or have some character flaw or narrow worldview, it's just that few people are reading and commenting between the late hours of Saturday and early morning of Sunday. It's 6 am Sunday in California as I post this.

I'm sure people will take this the wrong way, but a lot of the people who are on HN and who orbit technology circles in SF are really just not actually intellectually curious people.

They might like to think they are, they might try to pretend they are, but when pushed they're simply not.

Look at all of the groupthink that is perpetuated nonstop while they also proclaim they're creating, investing in, etc. so many unique ideas. Yet year after year it's the same thing in a different color.

What they actually are is interested in money and prestige. So give it a little time and they'll learn enough about biology to try and get some validation from their peers with comments. If money actually pours into bio that is.

Quite frankly, most people on HN are software devs with a wider interest in the world. HN’ers usually-ish comment when they have something insightful to sat, even if the insight is just a humble one.

But I dare to guess that most HN’ers did high school bio and that’s it, so it’s harder to even give a small thoughtful comment on it, so they refrain.

Case in point, I wouldn’t have commented either. But I feel at home here and notice some behavioral patterns. And compared to other fellow devs, I generally am more tuned to tune in on behavioral patterns because of having studied psychology.

But that’s just my take.

The accompanying preprint is interesting: https://www.biorxiv.org/content/10.64898/2026.06.03.729735v1

I work in a pharma-field and I wish we could easily design molecular binders. We still spend millions every year finding targets that could 'smuggle' our drugs into cells.

> very had to predict what binds to what, which of course is a multi-billion-dollar industry.

Do you need to predict when AP-MS is so cheap?

Mapping interaction interfaces is challenging and is where there is attention. I don’t think we’re going to get complexes as a commercial focus outside of receptors with known quaternary structure. The first issue, as you allude to, is the absence of training data, which itself highlights the relative commercial unimportance of such an endeavor.

> None of them can predict anything complex or 'big', everything is peptide-level.

Is this related to the current peptide boom?

Our mission is to cure or prevent all disease

Okay, now you have my attention.

What's the deal on the company behind it? “Biohub is a 501(c)(3) biomedical research organization...” Nonprofit. Nifty!

This all sounds great, but as we have recently seen with, say OpenAI, there is nonprofit and then there is nonprofit. Anyone know which Biohub is?

> This model is released under the MIT License.

Huh, appears to be actually open source, that's a pleasant surprise. Usually these academic models have some weird license attached to them.

Incredible, but also scary if you think about what it may be lowering the barrier of entry to…

  a scientific engine for prediction, design, and discovery that can map proteins across the tree of life, predict their structures, and design new protein binders that function in laboratory experiments.

Biohub is funded by Zuck and his wife - the full name is "Chan Zuckerberg Biohub"

https://en.wikipedia.org/wiki/Chan_Zuckerberg_Biohub

But that is also a problem with structures derived from the methods you list. None of them are 100% equivalent to in vivi structures.

> None of them can predict anything complex or 'big', everything is peptide-level.

Is this related to the current peptide boom?

This was posted on a Saturday night (in the US). A story posted at lunch time on a Tuesday is going to get 100x or maybe 10,000x more views than a story posted on a Saturday night.

But I dare to guess that most HN’ers did high school bio and that’s it, so it’s harder to even give a small thoughtful comment on it, so they refrain.

But that’s just my take.

I think this is it. I'm a general software engineer and would like to switch to being in this field but something like this is just way over my head. It sounds like a great innovation but I'm lacking the domain knowledge to fully understand it. I've spent several months going through online courses to basically get to a Biology 101 understanding. Getting to the level of understanding something like this seems like it would be a multi-year effort and I don't really know what's the best way to proceed.

> very had to predict what binds to what, which of course is a multi-billion-dollar industry.

Do you need to predict when AP-MS is so cheap?

I'm sure people will take this the wrong way, but a lot of the people who are on HN and who orbit technology circles in SF are really just not actually intellectually curious people.

They might like to think they are, they might try to pretend they are, but when pushed they're simply not.

But that is also a problem with structures derived from the methods you list. None of them are 100% equivalent to in vivi structures.

Biohub is funded by Zuck and his wife - the full name is "Chan Zuckerberg Biohub"

https://en.wikipedia.org/wiki/Chan_Zuckerberg_Biohub

> Do you need to predict when AP-MS is so cheap?

Yes, because the expensive part is making the thing.

I'd go even further: what happens in biology is antithetical to the way software people think.

The HN/YC crowd generally has software brain: https://www.theverge.com/podcast/917029/software-brain-ai-ba..., "when you see the whole world as a series of databases that can be controlled with the structured language of software code". Biology doesn't work like that most of the time, it's squishy and weird and unpredictable, and the models we have of biology (including genomics!) are faulty at best, misleading at worst. I've supervised PhD-students and it takes some time for people's brains to be comfortable with that squishiness, that random behaviour, that 'putting A into the system only rarely produces B and we don't really know why but we do it anyway' view of the world. Software engineers struggle, even abhor that kind of world, which is why you rarely see them being interested in it; and if they work in it, outcomes are sometimes amazing and Nobel Prize worthy, more often nonsense that silently disappears.

> Do you need to predict when AP-MS is so cheap?

Yes, because the expensive part is making the thing.

I'd go even further: what happens in biology is antithetical to the way software people think.

> Biology doesn't work like that most of the time, it's squishy and weird and unpredictable, and the models we have of biology (including genomics!) are faulty at best, misleading at worst.

interesting. i came to tech from a molecular biology background and my impression was the opposite. biology is predictable most of the time, but sometimes random and squishy. the trick is that we’re trying to learn why things work predictably and what causes the variations, and that why/how unknown is what is most uncomfortable for people outside of the disciplines.

i’m not fully disagreeing with you because it sounds like you have experiences that inform your perspective. i find it interesting because my own experiences bring me in from the inverse perspective.

Biologists have a superiority complex about the “complexity” and “singular difficulty” of their field born out of a need for justification for the vast deficiencies of their field’s progress compared to others. Its an elaborate coping mechanism where the people in other fields which make envyable progress (eg software, cs etc)- sighted enough to have recognized and avoided the decrepitude of biological sciences- are in fact the ignorant ones who “struggle” , “incapable of grasping” the way that biologists think. Its an inversion designed to obscure the harsh truth that these outsiders in fact see quite clearly the way that biologists think and it is the reason they have so diligently avoided their field.

No biologist stays an essentialist for long, that is for sure.

The world of uncertainty and the idea that we might not be able to understand everything or control it as much as we'd like.

It seems to me a lot of the modern "tech-bro culture" is trying to control the future and reduce uncertainty: Stop death, merge with the robotic super intelligence, colonize Mars to escape Earth inevitable decay, etc.

I'm still waiting for the startups claiming to reduce entropy or solve the false vacuum decay

> Biology doesn't work like that most of the time, it's squishy and weird and unpredictable, and the models we have of biology (including genomics!) are faulty at best, misleading at worst.

No biologist stays an essentialist for long, that is for sure.

The world of uncertainty and the idea that we might not be able to understand everything or control it as much as we'd like.

I'm still waiting for the startups claiming to reduce entropy or solve the false vacuum decay

The deficiencies of biology's progress as a field? The decrepitudes of biological sciences? Do we live in the same timeline?

You're simply wrong. I say this as a computer scientist who ended up studying and working in bioinformatics for a period of time.

The reason I don't now? It's that people don't understand biology enough to understand the currently untapped potential and definitely not the advances that have happened. So they allocate money to yet another todo app, food delivery app, crypto wallet, or yet another finetune of a model to talk like a caveman.

You're simply wrong. I say this as a computer scientist who ended up studying and working in bioinformatics for a period of time.

The deficiencies of biology's progress as a field? The decrepitudes of biological sciences? Do we live in the same timeline?

REDWOOD CITY, Calif., May 27, 2026 – Biohub today announced the release of a world model of protein biology: a scientific engine for prediction, design, and discovery that can map proteins across the tree of life, predict their structures, and design new protein binders that function in laboratory experiments.

Proteins are the machinery of life. Nearly every function of the human body depends on them. They are among the most important targets in medicine, yet designing functional, stable proteins that work as intended in the body is an immense scientific and technical challenge.

ESMFold2 predicted 3D structure of a protein complex with annotated features including pore-loop selectivity filter, BTB/T1 tetramerization domain, voltage-sensing peripheral helix, and cytosolic juxtamembrane segments

ESMC provides a foundation for modeling the sequence, structure, and function of proteins. ESMFold2 predicts the structure of proteins and biomolecular complexes with state-of-the-art accuracy and speed. Features derived from model representations capture fundamental principles of structure and function, forming a compositional grammar for protein biology.

Today, Biohub is making available to researchers everywhere an open discovery engine for protein structure prediction, design, and biological discovery built around three releases: ESMC, ESMFold2, and ESM Atlas:

The core scientific hypothesis of ESM is that training a language model across the sequences of all life will cause it to internalize the fundamental properties that govern protein biology — the rules underlying how proteins fold, interact, and function across all of life. At its foundation is ESMC, a state-of-the-art language model that represents proteins, trained on approximately 2.8 billion sequences drawn from across all of life.
ESMFold2 is the design engine built to transform ESMC’s sequence representations into atomically-resolved 3D structure of biomolecular complexes. In experiments described in a preprint, researchers used ESMFold2 to design protein binders against five targets central to cancer and immunology — a computational search completed in days, rather than several months or years. The lab-validated binders exhibited high affinity, specificity, and stability — properties critical for clinical utility — and showed minimal similarity to sequences in public databases, suggesting the model is producing de novo solutions, rather than retrieving known binders.
ESM Atlas makes ESMC’s representations navigable across 6.8 billion protein sequences and 1.1 billion predicted structures — the largest application of AI to protein biology to date. It organizes proteins by relationships the model has learned, surfacing connections that existing databases have not captured, including evolutionary links between gene-editing enzymes spread across distant branches of life. Much of that biology has never been annotated. For researchers working on diseases where the biology is poorly understood, it makes uncharacterized biology searchable.

All three are freely available to the global scientific community at Biohub Platform.

“Designing the interactions between proteins is a fundamental problem in biochemistry, and critical for the design of medicines. What we’ve shown is that these models have learned such a high-fidelity world model of biology that you can design protein interfaces computationally, take them into the laboratory, and they function as predicted.”

— Alex Rives, Head of Science, Biohub

ESMFold2: A faster path from protein biology to binder design

ESMFold2 is an open, state-of-the-art structure prediction model that translates knowledge of patterns across evolution encoded in ESMC into precise, atomic-resolution 3D models of proteins and their interactions. It leads across standard protein folding benchmarks at predicting protein-protein and antibody-antigen interactions.

DockQ benchmarks comparing ESMFold2, AlphaFold 3, and Protenix-v1 on antibody-antigen and protein-protein structure prediction

ESMFold2 achieves state-of-the-art accuracy in structure prediction, both for general protein-protein interactions and for the challenging and therapeutically relevant task of antibody-antigen prediction. From ESMC representations alone, ESMFold2 is more successful at predicting the true binding pose of antibody-antigen complexes than AlphaFold 3. When provided with the same evolutionary information (MSA) as AlphaFold, ESMFold2 is the strongest predictor on both benchmarks. Bottom: structure prediction models can benefit from a larger computational budget. When we let models make multiple predictions and score them based on their own confidence estimates, ESMFold2 consistently improves with more compute.

Antibody-based therapies have become a cornerstone of modern medicine, accounting for roughly one quarter of all new FDA drug approvals, spanning cancers, autoimmune diseases, and conditions that once had few treatment options. Finding a viable therapeutic candidate depends on identifying molecules that bind tightly and specifically to a disease target; a single preclinical binder candidate typically takes three to four years to develop. ESMFold2, which predicts the structural configurations most likely to achieve high affinity for a given target, can move much of the initial search into computation, producing experimentally testable designs in days.

Biohub researchers used the model to design protein binders against five targets at the center of cancer and immunology research — EGFR and PDGFRβ (implicated in tumor growth), PD-L1 and CTLA-4 (immune checkpoints that cancer cells exploit to evade detection), and CD45 (a regulator of immune cell signaling). Designs achieved hit rates of 36–88% for compact minibinders and 15–29% for antibody-derived formats, with confirmed binding in laboratory experiments. For PD-L1, designed binders restored T cell signaling in laboratory tests, blocking the same pathway that approved checkpoint therapies target.

ESMFold2 changes the accuracy and speed of early therapeutic binder discovery, transforming the initial search from largely empirical screening into computation-guided design that takes hours or days.

“Biohub was built on the belief that open science accelerates discovery. Making these tools freely available means researchers everywhere can move faster toward personalized cures that work for individual patients, because they target the specific biology driving their disease.”

—Dr. Priscilla Chan, Biohub Co-Founder

A shared, open scientific ecosystem built on a world model of protein biology

The world model of protein biology is trained on the evolutionary record of life itself, billions of protein sequences spanning the full breadth of life, including bacteria in deep soil, organisms in extreme environments, and the more than 20,000 types of proteins found in the human body. Its training objective is simple: predict the amino acids that evolution selects. Because evolution tends to preserve proteins that are fit for purpose, the patterns preserved across billions of years of data implicitly encode the physical rules governing protein function. What this work shows is that from this training, a world model emerges — one that has internalized those rules deeply enough to generate functional proteins from scratch.

Biohub’s mission is to cure and prevent disease. We believe the path to that goal is understanding biology at its deepest level — and making the tools of that understanding available to every scientist. Together, ESMFold2, ESMC, and ESM Atlas constitute a state-of-the-art, openly available ecosystem for protein structure prediction and design — a shared foundation for any researcher working on fundamental biology or the development of new therapeutics.

###

About Biohub

Biohub is a 501(c)(3) biomedical research organization building the first large-scale initiative to combine frontier AI and frontier biology to solve disease. With its compute capacity, AI research and engineering, and state-of-the-art technology for measuring, imaging, and programming biology, Biohub is enabling scientists worldwide to use AI-powered biology to study how cells operate and organize as systems — ultimately understanding why disease happens and how to cure or prevent it. Learn more at biohub.org.

Press Contact
press@biohub.org

Hacker Times

Hacker Times

Biohub releases a world model of protein biology

Discussion

Discussion

ESMFold2: A faster path from protein biology to binder design

A shared, open scientific ecosystem built on a world model of protein biology