
I recently attended the Learning Meaningful Representations of Life (LMRL) workshop at ICLR 2025. The goal of LMRL is to highlight machine learning methods which extract meaningful or useful properties from unstructured biological data, with an eye towards building a virtual cell. I presented my paper which demonstrates how standard Transformers can learn to meaningfully represent 3D coordinates when trained on protein structures. Each paper submitted to LMRL had to include a “meaningfulness statement” – a short description of how the work presents a meaningful representation.
What makes a representation meaningful?
After attending the workshop, I realized that the word “meaningful” can take on different meanings to different people. I picture meaningful representations as those which are interpretable, generalizable, and internally consistent. For instance, Alex Rives, one of the creators of ESM, gave a great talk on different iterations of ESM and the emergence of structure in protein language models. To me, this an amazing example of machine learning models learning something deep about the natural world.
However, “meaningful” can also be thought of as synonymous with “useful”. Practically, many useful ML projects tend to be lighter on the analysis of representations in order to focus on elements like experimental validation. While this is undoubtedly important, it often tells us little about how or what the models are learning. Most real-world success in biological ML is driven by better data which often times has nothing to do with how that data is being represented.
“Life” is studied in drastically different ways
A goal of the workshop was to explore ways to harmonize representations of different biological data modalities towards building a virtual cell model. The invited speakers were reflective of this goal: there were talks on protein language models, epidemiological statistics, biomedical imaging, flow cytometry, and more. Progress towards a virtual cell model was an ambitious goal for a one-day workshop, and I found it difficult to link the individual methods together. I enjoyed learning about other disciplines, but I also found myself wishing there was more time to chat with the other protein researchers.
A virtual cell?
During the panel discussion, one speaker described the virtual cell as a useful north star for our field. I think I agree, although we are so far away that it is hard to even imagine what a virtual cell might look like. For instance, we still struggle to reliably (or quickly) predict static protein structures for membrane proteins or antibody-antigen complexes even when we know that an antibody is a binder. To start integrating data like cellular imaging or even metabolic networks will require structure models that are orders of magnitude faster and more accurate than we have today. Hopefully, as we continue to imagine what a virtual cell might be able to do, it will pull researchers to create radically new methods to fill these gaps.