In drug discoverers’ jargon, large collections of molecules are known as chemical spaces. Whereas chemical spaces only contained a few million molecules some years ago, they now hold tens of billions of molecules. In addition, previously “virtual” molecules can now be purchased and produced on demand. And, while the process of searching chemical spaces used to be slow and tedious, and involved a great computational effort, today, even laptops can search through huge spaces in a short amount of time.
The vastness of the chemical space provides many possibilities for drug developers. Previously, drug discoverers searched for in-stock compounds in collections containing millions of molecules, known as molecule libraries. Today, advanced algorithms coupled with established chemistry and the idea of building molecules on-the-fly, have opened a whole new avenue of possibilities: huge chemical spaces containing several billions of molecules and magnitudes larger are now accessible within minutes through simple computer searches.
However, it is not only about speed and size. Beyond searching through mere in-stock compounds, drug discoverers are hunting for novel intellectual property (IP) that can be purchased and made on-demand in a short timeframe ‒ a drastic time and money saver for the entire industry.
The changing landscape of chemical space search tools
In recent years, the biopharma industry has experienced a drastic change in the way drug discovery is performed. “Software makers today are facing a new type of non-expert users, untrained in computational chemistry,” explains Marcus Gastreich, Senior Director of Application Science at BioSolveIT. “This translates to the necessity of coding new molecule search applications that are much easier to use than traditional ones.”
Moreover, as time is becoming more and more crucial in the drug discovery field, software tools have to be designed to work as fast as possible. “Anything that saves time, not only saves money but gives us a faster avenue towards developing important drugs,” Gastreich says. “Reducing computational efforts can already be seen as a solution to this challenge. On top of that, there is also a challenge of working with as little disk space as possible, while tackling chemical spaces as large as possible. For instance, recent research has revealed that a 166B chemical space written out in traditional form on a hard disk, consumes 400GB of disk space, in zipped format! Imagine a standard computer trying to handle that.”
This is where BioSolveIT comes in. The company has realized that the drug discovery landscape suffers from a scarceness of molecular library search tools that are fast, easy to use, and visual. Its software, infiniSee, combines all three characteristics and can be easily handled by both experts and non-experts. Furthermore, infiniSee generates computer proposals that are synthetically accessible by design. This means that the molecules which the computer comes up with are not only virtual but can also be produced.
Huge chemical spaces, fast searches, and purchasable molecules
“There is no use for a molecule that your computer can beautifully draw on your screen if it can’t be produced,” Gastreich emphasizes. “So imagine software that allows you to immediately purchase the molecule your computer has come up with. In collaboration with reliable compound vendors, such as Enamine and WuXi LabNetwork, we have bridged the gap between molecules that might be accessible and molecules that can actually be produced. Our users receive a computer proposal through infiniSee and can get quotes for it and order the molecules from these organizations, which then synthesize the molecules on-demand from giant chemical spaces.”
Rather than computing one molecule after another in a process called enumeration, infiniSee can assemble molecular solutions while it searches. This means that new molecules are assembled spontaneously while the computer navigates through a chemical space. “Similar to LEGO® bricks that you plug into each other to create new things, we navigate through a multidimensional hypercube of molecular possibilities,” Gastreich says. “The chemical space is spun up by the different reactions that work, allowing us to only touch upon those molecules that are producible or even purchasable in this giant chemical space. Basically, we avoid the costly enumeration and we deliver what should be synthesizable.”
Using a simple drag-and-drop application, users can input a molecule of their choice ‒ a molecule, which might be patented and for which an unpatented counterpart is needed, or one that might be used to find new molecules from the same chemical space. As an output, the user receives a list of molecules that are similar, but come from a huge chemical universe. infiniSee allows the user to click on a results molecule and see exactly how the two molecules differ or why the computer considers parts to act similarly.
This chemical space search tool combines visualization, ease of use, and speed
“infiniSee is very visual and the algorithm has an immense speed,” Gastreich explains. “Unlike our competitors, infiniSee uses a tree-based descriptor, called “Feature Trees”, to describe the molecules in a computer-readable form. The similarities between an input molecule and the output molecule are then presented graphically. The nice thing about Feature Trees is that the output is a beautiful, colorful alignment that shows exactly which parts of the molecules are similar and which are dissimilar.”
Through its collaboration with an increasing number of leading companies in the field, BioSolveIT has enabled its users to search for molecules in a great variety of chemical spaces. While traditional enumeration-based tools have a maximum size of roughly 1010, there is a need for even bigger spaces. As a comparison: the number of stars in the Milky Way has recently been estimated to range at a size of around 109.
Boehringer Ingelheim coded their own chemical space BICLAIM with a size of 1011, Pfizer developed a chemical space called PGVL with an initial size of 1014, Evotec’s EVOspace has a size of 1016, AstraZeneca’s AZ Space contains 1017 molecules, and Merck’s MASSIV chemical space reaches a size of 1020. All of the molecules within these huge chemical spaces have a very high likelihood of being producible. Moreover, the first active compounds have already been published.
Working on interactivity in 3D and synergies with artificial intelligence
BioSolveIT is currently pursuing two different areas in the development of software. Due to the growing need for fast and visual drug discovery software, the company has developed a tool called SeeSAR, which allows drug designers to work with molecules interactively, for example, by docking ligands into the active sites of proteins to optimize hits. Moreover, BioSolveIT also sees great potential in opening even larger spaces to drug discoverers and is currently working with providers of new molecular entities.
“We are looking into moving beyond mere 2D searching in chemical spaces and would like to provide 3D docking of purchasable or highly tangible molecules in active sites of proteins,” Gastreich explains. “We also have interesting collaborations currently going on that allow the searching of DNA encoded libraries (DELs) with our technology.”
Additionally, AI-driven companies are using BioSolveIT’s technology to move from virtual molecules to purchasable molecules. While current AI-based methods exist for new chemical matter, “they are also limited in the number of molecules that are actually purchasable”, says Gastreich. By merging AI-generated outputs with existing spaces, thus combining chemical spaces, one could obtain synthetically producible results.
“Overall, it is all about the step from virtual to purchasable,” Gastreich says. “It is really about what the computer has proposed and what I can have on my lab bench in the next few weeks. I think our technology can bridge that gap between purely virtual to purchasable. And this will be the big money and time saver for the biopharma industry. Merck, for instance, has already saved 90% of associated costs across a dozen projects.”
Collaborating with another undisclosed big pharma company, BioSolveIT has embarked on the creation of a space exceeding 1025 tangible compounds. It is an enormous step, but it is definitely not the end of the story. The entire relevant chemical space has been estimated to offer 1063 molecules to cure diseases.
Images via Shutterstock.com and BioSolveIT
Author: Larissa Warneck, Science Journalist at Labiotech.eu