Is Cloud Computing the Answer to Genomics’ Big Data Problem?

07/12/2021 - 8 minutes

The success of the genomics industry has led to the generation of huge amounts of DNA sequencing data. Cloud computing could offer a solution for scientists who currently lack the tools to make full use of this data. 

Over the last decade, genomics has become the backbone of drug discovery. It has allowed scientists to develop more targeted therapies, boosting the chances of successful clinical trials. In 2020, 39% of FDA-approved drugs were personalized therapies, a trend that has been sustained for the past four years. 

The ever-increasing use of genomics in the realm of drug discovery and personalized treatments can be traced back to a drastic reduction in DNA sequencing costs. The first sequenced genome, part of the Human Genome Project, cost €2.4B and took around 13 years to complete. Fast forward to today, and you can get your genome sequenced in less than a day for under €900.

It is estimated that over 100 million genomes will have been sequenced by 2025as part of genomic projects. Efforts from both big pharma and national population genomics initiatives are already garnering immense quantities of data that are only likely to increase over time. With the right analysis and interpretation, this information could push precision medicine into a new golden age. 

Are we ready to deal with enormous quantities of data?

Just one human genome sequence produces approximately 200 gigabytes of raw data. If we manage to sequence 100 million genomes by 2025 we will have accumulated over 20 billion gigabytes of raw data. Such a massive amount of data can partially be managed through data compression technologies — companies such as UK-based Petagene focus on reducing the size and therefore the storage costs of genomic data.

However, that doesn’t solve the whole problem. DNA sequencing is futile unless each genome is thoroughly analyzed to achieve meaningful scientific insights. Genomics data analysis normally generates an additional 100 gigabytes of data per genome and requires massive computing power, which can be economically unfeasible for many companies and institutions. 

Content continues below

Related Content

In addition, specialized and high-level hardware, such as graphics processing units, require constant upgrades to remain performant. Furthermore, as most high-performance computers have different configurations, ranging from technical specs to required software, the reproducibility of genomics analyses across different infrastructures is not a trivial feat. 

Cloud genomics - high perf computers

Cloud computing: a data solution for small companies

Cloud computing has emerged as a viable way to analyze large datasets fast without having to maintain and upgrade servers. Simply put, cloud computing is a pay-as-you-go model allowing users to rent computational power and storage. 

According to cloud computing provider Univa, more than 90% of organizations requiring high-performance computing capacity have moved, or are looking into moving to the cloud. Although this is not specific for companies in the life sciences, Univa’s CEO Gary Tyreman suggests that pharmaceutical companies are ahead of the market in terms of adoption.

The cloud offers flexibility, an alluring characteristic for small life science companies that may not have the capital on-hand to commit to large upfront expenses for IT infrastructure. This means that there is no risk money will be wasted on idle computational resources. As a consequence, many opt to test their product in the cloud first, and if numbers look profitable, they can then invest in an in-house computing solution. 

Flexibility also extends to storage: data can be downloaded directly to the cloud and removed once the analyses are finished while ensuring data protection. 

Content continues below

Related Content

Will Jones, CTO of UK-based startup Sano Genetics, believes the cloud is the future of drug discovery. The company offers consumer genetic tests and carries out large data analyses for researchers using its services in the cloud. In a partnership between Sano Genetics and another Cambridge-based biotech company, Jones’s team used the cloud to complete the study at a tenth of the cost and in a fraction of the time it would have taken with alternative solutions.

Besides economic efficiency, Jones says that moving operations to the cloud has provided Sano Genetics with an additional security layer, as the leading cloud providers have developed best practices and tools to ensure data protection. 

Why isn’t cloud computing mainstream in genomics?

Despite all of the positives of cloud computing, we haven’t seen a global adoption of the cloud in the genomics sector yet.

An example of a company adopting the technology is Medley Genomics, a US-based startup using genomics to improve diagnosis and treatment of complex  diseases such as cancer. In 2019, the company moved all company operations to the cloud in a partnership with London-based Lifebit. 

Having spent more than 25 years at the interface between genomics and medicine, Patrice Milos, CEO and co-founder of Medley Genomics, recognizes that the uptake of cloud computing has been slow in the field of drug discovery, as the cloud has several limitations that are preventing its widespread adoption. 

For starters, long-term cloud storage is more expensive than in-house; cloud solutions charge per month per gigabyte, whereas with in-house computers, once you’ve upgraded your storage disk, you have no additional costs. The same goes for computing costs: while the cloud offers flexibility, Tyreman says that the computation cost of a single analysis is five times more expensive compared to an in-house solution in many scenarios. However, as cloud technologies continue to progress and the market becomes increasingly more competitive among providers, the ongoing ‘cloud war’ will likely bring prices down. 

Cloud genomics - big data&DNA

Furthermore, in the world of drug discovery, privacy and data safety are paramount. While cloud providers have developed protocols to ensure the data is safe, some risks still exist, for example, when moving the data. Therefore, large pharmaceutical companies prefer internal solutions to minimize these risks. 

According to Milos, privacy remains the main obstacle for pharmaceutical companies to fully embrace the cloud, while the cost to move operations away from in-house computers is no longer a barrier. While risks will always exist to a certain extent, Milos highlighted that the cloud allows seamless collaboration and reproducibility, both of which are essential for research and drug discovery.

“The amount of data that organizations are now dealing with has become absolutely unmanageable with traditional technologies, and is too big to even think about moving,” explained Maria Chatzou Dunford, Lifebit’s CEO and co-founder.

Federation is a concept from computing now used in the field of genomics. It allows separate computers in different networks to work together to perform secure analysis without having to expose private data to others, effectively removing any potential security issues. 

“When data is moved, you increase the chances of having it be intercepted by third parties, essentially putting it at significant risk. Data federation is the only way around this — unnecessary data storage and duplication costs, and painstakingly slow data transfers become a thing of the past,” Dunford said.

Getting ready for the genomics revolution

Cloud computing is a booming business. Within the life sciences sector, several players are offering specialized services. An example is London-based Lifebit, whose technology allows users to run any bioinformatics analyses through any cloud provider with a user-friendly interface — effectively democratizing bioinformatics for all researchers, regardless of skill set. 

It’s no secret that genomics is key to enabling personalized medicine and advancing drug discovery. We are now seeing a genomics revolution where we have an unprecedented amount of data ready to be analyzed. Analyzing big data requires massive computation power, effectively becoming an entry barrier for most small organizations. Cloud computing provides an alternative to scale analyses, while at the same time, facilitating reproducibility and collaboration

While the cost and security limitations of cloud computing are preventing companies from fully embracing the cloud, these drawbacks are technical and are expected to be resolved within the next few years. 

Many believe that the benefits of the cloud heavily outweigh its limitations. With major tech giants competing to offer the best cloud solutions we could expect a drastic reduction in costs. <eanwhile, leading genomics organizations are developing new tools and technologies to protect genomic data. 

Taken as a whole, it is likely that the cloud will be increasingly important in accelerating drug discovery and personalized medicine. According to Univa’s Tyreman, it will take around 10–15 years to see the accelerated transition from HPC to cloud, as large organizations are often conservative in embracing novel approaches. 

Distributed big data is the number one overwhelming challenge for life sciences today, the major obstacle impeding progress for precision medicine,” Chatzou Dunford concluded.

The cloud and associated technologies are already powering intelligent data-driven insights, accelerating research, discovery and novel therapies. I have no doubt we are on the cusp of a genomics revolution.”  


F.AbbondanzaFilippo Abbondanza is a PhD candidate in Human Genomics at the University of St Andrews in the UK. While doing his PhD, he is doing an internship at Lifebit and is working as marketing assistant at Global Biotech Revolution, a not-for-profit company growing the next generation of biotech leaders. When not working, he posts news on LinkedIn and Twitter.

 

Cover illustration by Elena Resko. Images via Lifebit and Shutterstock. This article was originally published in January 2020 and has since been updated to reflect recent developments in the field. 

You might also be interested in the following: