Dispatches from Lisbon

Tiles, tiles, as far as the eye can see. Conquerors on horseback storming into the breach; proud merchant ships cresting ocean waves; pious monks and shepherds tending to their flocks; Christ bearing the cross to Calvary—in intricate tones of blue and white on tin-glazed ceramic tilework. Vedi Napoli e poi muori the Sage of Weimar once wrote—to see Naples and die. But had he been to Lisbon?

The azulejos of the city’s numerous magnificent monasteries are far from the only thing for the weary PhD student to admire. Lisbon has no shortage of imposing bridges and striking towers, historically fraught monuments and charming art galleries. Crumbling old castles and revitalised industrial quarters butt up against the Airbnbs-and-expats district, somewhere between property speculation and the sea. An endearing flock of magellanic penguins paddles away an afternoon in their enclosure at the local aquarium (which is excellent), and an alarming proliferation of custard-based pastries invites one to indulge.

Still, one is there to work. Two, actually. Isaac to present his research on geometric reasoning in transformers, and Ody to teach a training course on using OPIG’s tools and databases for antibody design and developability prediction. And to attend the conference, of course, which runs from 08:30 to 19:00 for three packed days. Talks are to be given, among many others, by protein folders and protein diffusers; by Fc engineers and anti-venom developers; by trainers of protein language models. Later in the week, Charlotte is to deliver a keynote on modelling dynamic structures for antibody design. An interesting few days lie ahead.

The Merchants of Lisbon

As far as conference acronyms go, “PEGS” is perhaps more infelicitous than most. APES might be better, though even this would suggest undue thematic balance. In fact, the Protein & Antibody Engineering Summit is almost exclusively focussed on antibodies.

The conference runs twice yearly for three days in up to 8 concurrent tracks, alongside short courses and training seminars, poster sessions, panel discussions, and, not to forget, lunch. Topics range widely, from bispecifics to oncology, with a clear experimental focus overall. The venue is sprawling: after the first day we know better than to hop tracks all too frequently.

The several thousand attendees skew heavily towards industry, but the talks provide a healthy mix. In the ML track, which we both mainly attend, some 40% are delivered by academics. The remainder is equal parts startups with coruscating, results-only slides, and big pharmaceutical players with ML-enabled pipelines. Big smiles, broad strokes, approved by legal.

The other big game in town is sales, with row upon row of industry booths separating the auditoria from the poster stands. Few appear to be hiring, but everyone has something to sell you. In fact, the pitching begins before we even arrive. After signing up for the conference app a few days in advance, we receive effusive emails promising to address all our viscosity workflow needs. The brochure is sitting in our pidges when we get back. We try to let them down gently.

Digging In

A few talks catch our eye in particular:

Ab:Ag Co-Folding with OpenFold 3 and Boltz

Vinay Swamy from the AlQuraishi Lab discusses antibody-antigen complex prediction with OpenFold 3 & Co: no model beats AlphaFold; repeated prediction helps—AlphaFold more so than its competitors. If we could just rank the better predictions ahead of all others, that would help even more, he says; they’re in there somewhere, just buried. One promising workaround comes from Stephanie Linker at Merck: don’t re-predict randomly. Use Boltz; pick ‘contact’ residues; scan them through the antigen. Force diversity and double your success rate. To forty-fiveish percent, but still.

Replicating AlphaFold 3 is no joke either, Swamy says. No two attempted reproductions are alike. Distillation is compute-intensive, which doesn’t help, but in places the model is just under-documented. Small details matter: Who knew that masking unresolved residues at train-time affects the sterics of small-molecules at inference? Trial and error take time when effects only surface mid-way through training; and training costs $200,000 a pop.

Scaling Protein Language Models

Over at Profluent, they’ve been scaling PLMs, and have just released E1, their latest encoder-cumpreprint. While models like ESM2 tap out at three billion parameters, Ali Madani’s team have pushed to forty-six. Scale, it turns out, begets interesting behaviour:

Though perplexity and sequence recovery improve steadily with model size, zero-shot property correlation plateaus at a few billion parameters. Extra bulk, however, helps models take direction, with larger ones easier to align to a given task—say, predicting stability—be that through fine-tuning or in-context learning. Perhaps most intriguingly, models pre-trained on MSAs improve unsupervised contact prediction even when run in single-sequence mode. Might they have, beyond evolutionary patterns alone, learned a higher abstraction of what sequences represent?

Programmable Constant Regions

Edward Irvine from the Reddy Lab asks us to spare a thought for the Fc. The wildtype binds a laundry list of receptors, each with distinct downstream effects, but for better immunomodulation, we’d like to pick and choose.

Their approach is to screen a massive Fc variant library for receptor binding, and use the data to train binary sequence classifiers for each receptor. These provide the RL rewards to post-train a language model that generates Fc sequences conditioned on receptor type(s). Their FcGPT generates successful designs for most binding profiles tested (76%), at a respectable 11% mean success rate for each. The preprint ought to be an interesting read.

VHH Antivenoms

Meanwhile, Andreas Laustsen from DTU is concerned that snake antivenoms are still produced by venomising horses and harvesting their blood. While generally effective —ethics aside—this does raise a number of practical problems; not least that many antivenoms are single-use, as patients end up with anti-horse antibodies. (These are not, to be clear, the antibodies of anti-horses.)

Laustsen’s lab are blending five llama antibodies into a VHH cocktail, to serve as a broad-spectrum antivenom. Such a thing would ease distribution and storage, and help patients bitten by snakes who failed to introduce themselves first. It might also just work, since mixtures of largely similar proteins make up most common snake venoms. In a promising start, the approach proved effective against 17/18 venoms in mice, though efficacy in humans and ease of manufacture remain open questions for now.

Stepping Back

Perhaps predictably, broader insights emerge only once we venture beyond the ML track. At our first panel discussion, offhand remarks give us pause. Gradually, dimly, we begin to perceive a disconnect.

Binder Design is not the Grand Challenge.

First, we notice that sentiments on generative design seem to range from skeptic to circumspect—nothing like the hype around de novo protein binders we’re so familiar with. In our orbit of in silico modelling, “antibodies binding any target” is the rallying cry of an entire field: a tantalising challenge, with a plausibly tractable solution, imminently poised to “unlock” drug design.

But finding binders isn’t really that hard; an experimentalist can do it in a few rounds of screening. Just give them a week or two with some phages or mice. As one panelist drily remarks: “I have never seen a drug fail because it doesn’t bind.”

Really, the hard part doesn’t come until after. Months upon months making the thing affine and developable; whole quarters making sure it’s specific; yet more time spent on formulation; and that’s before all the clinical trials. Drug design has no single bottleneck: the entire thing is bottlenecks all the way. It’s multi-objective, that’s what makes it hard—not any one single thing, like binding. No single grand challenge exists, a sad truth we sometimes forget.

Not to mention that safety, efficacy, off-target effects aren’t properties of an antibody alone, but rather those of the whole patient system. “Getting molecules into patients is biology, not physics”, a panelist remarks, and designing binders is likely trivial by comparison. And, arguably, far less impactful. If a grand challenge did exist, this wouldn’t be it.

The Real Value Lies Out of Distribution.

As the talks go on, more patterns emerge. Inasmuch as industry seems excited about ML, it’s mainly for faster turn-around; for augmenting, not replacing, existing pipelines. ‘Active learning’ is the buzzword du jour; the focus on smarter screening—not necessarily less.

It also becomes clear that people get creative. They want novel drugs, not just the same ones faster. The most impactful biologic of the past 30 years, after all, has been no antibody at all: semaglutide is a peptide, a more recent modality. Much gushing over novel targets, exotic constructs, and unnatural residues follows. If ML brought the impossible within reach, now that might be impressive, rather than just marginal improvements over screening results alone (if perhaps delivered faster).

The trouble is that both these tasks are inherently tricky. Screening of any kind induces distribution shift by design, and novel drugs we necessarily know very little about. These are not settings in which ML usually shines. If you want to add real value, learn to extrapolate.

Have You Considered Antigen Engineering?

If screening is to remain key, thinking more like an experimentalist might help. We find ourselves chatting to a particularly friendly one, over lunch. The fact is that antibodies are all alike, we hear: well-studied, well-understood, and generally well-behaved. Each antigen, however, is an antigen in its own way—and you need a bespoke assay for every single one. Developing that can be a real pain, especially if it’s membrane-stabilised, or only pathogenic in certain conformations. Couldn’t AI help with that instead? Engineer antigens for expression, or stabilise them in solution, for that matter? What about virtual screening for off-target effects? Far more sweat and tears are shed on that than on finding binders.

For ML to have a real impact on the discovery pipeline, it must address the steps that are actually slow and expensive. That could be picomolar binder design on the first try (or targeting tricky epitopes), but other things might just have more of an impact. Binder design may well be a technical prerequisite for some, but it is not sufficient to really move the needle.

OPIG is Everywhere.

Most striking, perhaps, in talk after talk, is just how often familiar names come up. As the place to go to construct a dataset; as reference points against which to compare; as load-bearing components of industrial pipelines. Right across academia and industry, OPIG’s tools and databases are widely employed. More than anything, it is humbling to see.

May people continue to find our work useful.


The day after the conference, the sky is overcast, but bright. Smoky beige and greyish streaks hang low over the Tagus, drifting slowly on a sea of white. Below, the estuary lies calm and placid, fading into the Atlantic somewhere out of sight. The air is mild and humid. In England, this might pass for summer.

On an impulse, we scale the bell tower of the Igreja de Graça. Obligingly, the city unrolls its hills beneath our feet, draped in their mantles of little alleyways and rooftops. Scattered spires punctuate the landscape; little blots of colour bustle about below. Beyond, in the distance, slender bridges arc across the horizon. A container ship or two idle on the sea.

We stare for a while, point out this and that. There are infinite particulars, but it’s the perspective that lingers.

On our way down, blue-tainted saints and sinners gaze down from the walls. We stop, for a moment, to admire the tilework; something catches our eye. A pastoral scene of a princely procession, threading its way through bucolic hills. Off by the wayside, somewhere in a corner, two little (o)piglets look on pensively.

Authors