Is Cloud Computing the Answer to Genomics’ Big Data Problem?

15/01/2020 - 9 minutes

The success of the genomics industry has led to generation of huge amounts of sequence data. If put to good use, this information has the potential to revolutionize medicine, but the expense of the high-powered computers needed to achieve this is making full exploitation of the data difficult. Could cloud computing be the answer?

Over the last decade, genomics has become the backbone of drug discovery. It has allowed scientists to develop more targeted therapies, boosting the chances of successful clinical trials. In 2018 alone, over 40% of FDA-approved drugs had the capacity for being personalized to patients, largely based on genomics data. As the percentage has doubled over the past four years, this trend is unlikely to slow down anytime soon. 

The ever-increasing use of genomics in the realm of drug discovery and personalized treatments can be traced back to two significant developments over the past decade: plunging sequencing costs and, consequently, an explosion of data. 

As sequencing technologies are constantly evolving and being optimized, the cost of sequencing a genome has plummeted. The first sequenced genome, part of the Human Genome Project, cost €2.4B and took around 13 years to complete. Fast forward to today, and you can get your genome sequenced in less than a day for under €900.

Cloud genomics - gene sequencing

According to the Global Alliance for Genomics and Health, more than 100 million genomes will have been sequenced in a healthcare setting by 2025. Most of these genomes will be sequenced as part of large-scale genomic projects stemming from both big pharma and national population genomics initiatives. These efforts are already garnering immense quantities of data that are only likely to increase over time. With the right analysis and interpretation, this information could push precision medicine into a new golden age. 

Related Content

Are we ready to deal with enormous quantities of data?

Genomics is now considered a legitimate big data field – just one whole human genome sequence produces approximately 200 gigabytes of raw data. If we manage to sequence 100M genomes by 2025 – we will have accumulated over 20B gigabytes of raw data. The massive amount of data can partially be managed through data compression technologies, with companies such as Petagene, but that doesn’t solve the whole problem.

What’s more, sequencing is futile unless each genome is thoroughly analyzed to achieve meaningful scientific insights. Genomics data analysis normally generates an additional 100 gigabytes of data per genome for downstream analysis, and requires massive computing power supported by large computer clusters – a feat that is economically unfeasible for the majority of companies and institutions. 

Researchers working with large genomics datasets have been searching for other solutions, because relying solely on such high-performance computers (HPC) for data analysis is economically out of the question for many. Large servers require exorbitant amounts of capital upfront and incur significant maintenance overheads. Not to mention, specialized and high-level hardware, such as graphics processing units, require constant upgrades to remain performant. 

Cloud genomics - high perf computersFurthermore, as most HPCs have different configurations, ranging from technical specs to required software, the reproducibility of genomics analyses across different infrastructures is not a trivial feat. 

Cloud computing: a data solution for small companies

Related Content

Cloud computing has emerged as a viable way to analyze large datasets fast without having to worry about maintaining and upgrading servers. Simply put, Cloud computing is a pay-as-you-go model allowing you to rent computational power and storage. and it’s pervasive across many different sectors. 

According to Univa – the industrial leader in workload scheduling in the cloud and HPC – more than 90% of organizations requiring high performance computing capacity have moved, or are looking into moving to the cloud. Although this is not specific for companies in the life sciences, Gary Tyreman – Univa’s CEO – suggests that pharmaceutical companies are ahead of the market in terms of adoption.

The cloud offers flexibility, an alluring characteristic for small life science companies that may not have the capital on-hand to commit to large upfront expenses for IT infrastructure: HPC costs can make or break any company. As a consequence, many opt to test their product in the cloud first, and if numbers look profitable, they can then invest in an in-house HPC solution. 

The inherent ‘elasticity’ of cloud resources enables companies to scale their computational resources in relation to the amount of genomic data that they need to analyze. Unlike with in-house HPCs, this means that there is no risk money will be wasted on idle computational resources. 

Elasticity also extends to storage: data can be downloaded directly to the cloud and removed once the analyses are finished, with many protocols and best practices in place to ensure data protection. Cloud resources are allocated in virtualized slices called ‘instances’. Each instance hardware and software is pre-configured according to the user’s demand, ensuring reproducibility. 

Cloud genomics - cloud computing picWill Jones, CTO of Sano Genetics, a startup based in Cambridge, UK, offering consumer genetic tests with support for study recruitment, believes the cloud is the future of drug discovery. The company carries out large data analyses for researchers using its services in the cloud.

In a partnership between Sano Genetics and another Cambridge-based biotech, Jones’s team used the cloud to complete the study at a tenth of the cost and in a fraction of the time it would have taken with alternative solutions.

Besides economic efficiency, Jones says that moving operations to the cloud has provided Sano Genetics with an additional security layer, as the leading cloud providers have developed best practices and tools to ensure data protection. 

Why isn’t cloud computing more mainstream in genomics?

Despite all of the positives of cloud computing, we haven’t seen a global adoption of the cloud in the genomics sector yet.

Medley Genomics — a US-based startup using genomics to improve diagnosis and treatment of complex heterogeneous diseases, such as cancer — moved all company operations to the cloud in 2019 in a partnership with London-based Lifebit. 

Having spent more than 25 years at the interface between genomics and medicine, Patrice Milos, CEO and co-founder of Medley Genomics, recognized that cloud uptake has been slow in the field of drug discovery, as the cloud has several limitations that are preventing its widespread adoption. 

Cloud genomics - genomics in cloud

For starters, long-term cloud storage is more expensive than the HPC counterpart: cloud solutions charge per month per gigabyte, whereas with HPC, once you’ve upgraded your storage disk, you have no additional costs. The same goes for computing costs: while the cloud offers elasticity, Univa’s CEO Tyreman says that the computation cost of a single analysis is five times more expensive compared to an HPC solution in many scenarios. However, as cloud technologies continue to progress and the market becomes increasingly more competitive among providers, the ongoing ‘cloud war’ will likely bring prices down. 

Furthermore, in the world of drug discovery, privacy and data safety are paramount. While cloud providers have developed protocols to ensure the data is safe, some risks still exist, for example, when moving the data. Therefore, large pharmaceutical companies prefer internal solutions to minimize these risks. 

According to Milos, privacy remains the main obstacle for pharmaceutical companies to fully embrace the cloud, while the cost to move operations away from HPCs is no longer a barrier. While risks will always exist to a certain extent, Milos highlighted that the cloud allows seamless collaboration and reproducibility, both of which are essential for research and drug discovery.

Current players in the cloud genomics space

Cloud computing is a booming business and 86% of cloud customers rely on three main providers: AWS (Amazon), Azure (Microsoft) and Google Cloud. Although the three giants currently control the market, many other providers exist, offering more specialized commercial and academic services

Lifebit - wave-5@1x

Emerging companies are now leveraging the technology offered by cloud providers to offer bioinformatics solutions in the cloud, such as London-based Lifebit, whose technology allows users to run any bioinformatics analyses through any cloud provider with a user-friendly interface – effectively democratizing bioinformatics for all researchers, regardless of skill set. 

Federation is a concept from computing now used in the field of genomics. It allows separate computers in different networks to work together to perform secure analysis without having to expose private data to others, effectively removing any potential security issues. 

“The amount of data organizations are now dealing with has become absolutely unmanageable with traditional technologies, and is too big to even think about moving,” explained Maria Chatzou Dunford, Lifebit’s CEO and co-founder.

“When data is moved, you increase the chances of having it be intercepted by third-parties, essentially putting it at significant risk. Data federation is the only way around this – unnecessary data storage and duplication costs, and painstakingly slow data transfers become a thing of the past.”

Getting ready for the genomics revolution

It’s no secret that genomics is key to enabling personalized medicine and advancing drug discovery. We are now seeing a genomics revolution where we have an unprecedented amount of data ready to be analyzed. 

The challenge now is: are we ready for it? To be analyzed, big data requires massive computation power, effectively becoming an entry barrier for most small organizations. Cloud computing provides an alternative to scale analyses, while at the same time, facilitating reproducibility and collaboration

Cloud genomics - big data&DNAWhile the cost and security limitations of cloud computing are preventing companies from fully embracing the cloud, these drawbacks are technical and are expected to be resolved within the next few years. 

Many believe that the benefits of the cloud heavily outweigh its limitations. With major tech giants competing to offer the best cloud solutions – a market valued at $340 billion by 2024 – we might be able to expect a drastic reduction in costs. While some privacy concerns may still exist, leading genomics organizations are developing new tools and technologies to protect genomic data. 

Taken as a whole, it is likely that the cloud will be increasingly important in accelerating drug discovery and personalized medicine. According to Univa’s Tyreman, it will take around 10–15 years to see the accelerated transition from HPC to cloud, as large organizations are often conservative in embracing novel approaches. 

Distributed big data is the number one overwhelming challenge for life sciences today, the major obstacle impeding progress for precision medicine,” Chatzou Dunford concluded.

The cloud and associated technologies are already powering intelligent data-driven insights, accelerating research, discovery and novel therapies. I have no doubt we are on the cusp of a genomics revolution.”  

F.AbbondanzaFilippo Abbondanza is a PhD candidate in Human Genomics at the University of St Andrews in the UK. While doing his PhD, he is doing an internship at Lifebit and is working as marketing assistant at Global Biotech Revolution, a not-for-profit company growing the next generation of biotech leaders. When not working, he posts news on LinkedIn and Twitter.


Images via E. Resko, Lifebit and Shutterstock

Do you want to remove this advert? Become a member!
Do you want to remove this advert? Become a member!

You might also be interested in the following: