By Markus Gershater, co-founder and chief scientific officer, Synthace
The buzz surrounding generative AI has reached fever pitch. It’s a technology so unique and fascinating that the breadth of possibility has done that rare thing of exciting both consumer and business imaginations at the same time.
It’s not hard to see why.
In the life sciences, AI has made waves so large that many are making entirely credible claims that the continued development of these tools will completely transform the world of biological research. And I agree. Ever the skeptical scientist, however, I am inclined to ask what the limits are to any new methodology or technology, no matter the surrounding enthusiasm.
We have, in recent years, seen promising results emerge from existing high-quality datasets that were collected in a systematic way (e.g. genome sequences, and protein structures). There are two main misconceptions about today’s generative AI, however: that it can only work on high-quality data, or that it needs lots of it. In truth, even small amounts of low-quality data show promise. That’s why AI could eventually prove valuable for research at most every scale.
But what of the data we don’t have? I don’t mean the data we haven’t gotten around to collecting yet, but rather the data that we can’t currently collect at all: our blind spots. The question I have been asking myself, more and more, is this: what data do we have about experimentation itself?
The impact of AI on the power of experiments
Before we answer that question, let’s first imagine a future where AI is deeply integrated into the process of experimentation. Combining data from the literature, all of your institutional experimental data and metadata, and integrated with fully automated laboratories, the impact of AI on the power and possibility of our experiments could be significant.
There are two questions that AI-enabled experimentation could conceivably answer in the context of experimentation. The first: what’s the best experiment for me to be running next? The second: what’s the best way to improve the experiments I’m already running—for accuracy, speed, cost, complexity, insight, or some combination of the above?
AI is incredibly adept at noticing patterns that humans can’t see. Let’s say we first trained an AI on the data of all available scientific literature (something that Daniel Goodwin speculates may soon happen to all 100TB/88MM papers in SciHub). Then, let’s say we also trained it on every available data point about the experiments you’ve already run, and are running, within your organization. What would it see that we could not? Could it see a missed opportunity to run an experiment in an area we’d never considered? Could it make suggestions about how your experiments could be an order of magnitude more insightful at lower cost? Could it predict the probability of success in an experiment you’re planning?
The possibilities are exciting. There’s just one problem right now, though: where are you getting those “available data points” about the experiments that run in your lab? Experiments as we understand them today are inherently analog in nature. As a concept, they predominantly live in the mind of the scientist who must keep everything coherent and on track. The rest is scattered around the lab. There are designs in notebooks and word documents. Calculations on paper and spreadsheets. Device instructions in scripting software. Sample information in inventory systems. Data in databases. Sometimes even rough notes, written down and forgotten.
Would a digital experiment be possible?
Experiments: analog vs digital
Experiments can either be manual, or automated. To move at speed, many companies look to automation to speed up their experiments. But automation is difficult to get off the ground in a lab. Even when you do, it’s usually limited to discrete tasks. Liquid handling “orchestration” bills itself as more sophisticated, but neither automation nor orchestration has any connection with the intent, design, or context of the broader experiment.
In short, lab automation knows the how, but not the why.
While point solutions like these and others have long been considered one of the best ways to digitize a lab, the approach feels limiting in the era of AI. While AI could absolutely improve the efficiency of automation scripts, or find ways to improve certain automation actions, this is likely to be as far as things will go without a different underlying approach.
The different approach I believe will emerge in 2023? Digital experiments.
No longer limited to a purely analog domain, the digital experiment would be designed and planned in one place. It would be created and edited from anywhere in the world and replicated in any lab. It would integrate with equipment to run powerful multifactorial workflows, then automatically gather data and metadata for analysis. It would capture and connect the context and intent of each experiment, helping others understand them with accuracy. It would be a powerhouse for collaboration, acting as a common resource for everyone who needs to understand an experiment—including artificial intelligence.
Digital experiment would enable AI to identify complex patterns that cut across the intent, the execution and the data of an experiment. It would enable an AI to make powerful predictions that could help scientists improve the power of their experiments or generate better insights from their experimental activities.
At their core, digital experiments would be a powerful, unified blueprint for biological experimentation in life science R&D. When they emerge, they will begin to unlock the full power of AI and cloud computing in lab-based experimentation. Even more than this, they will become a way for scientists, technicians, engineers, data scientists, bioinformaticians, and leadership to work from a shared model in a way that was previously impossible.
Experiments have always been the foundation of our biological understanding. If we can bring them into the digital domain, their potential will be transformative.