The big data revolution is all around us, from politics to power plants, manufacturing to multimedia content, Wall street to Main Street. Small wonder that big data also is driving radical change in health care and therapeutic development.
"We live and die by big data," said Nick Conley, CEO and co-founder of Epibiome Inc., a precision microbiome engineering company using sequencing and bioinformatics to identify pathogenic bacteria and microbiome changes that may contribute to disease development.
A paper by researchers from Georgetown University Medical Center published in 2014 in Expert Review of Clinical Pharmacology made a compelling case that the "era of the '-omics'" had arrived in health care delivery and biomedical discovery, describing the digital revolution in health care as "now." In the ensuing two years, the pace of that disruption has only accelerated.
The number of drug development partnerships focused on analytics, though not yet rivaling traditional collaborations, is on the rise. Last month, software start-up Bioz Inc. introduced what it called "the world's first search engine for life science experimentation," designed to speed the translation of scientific research into drug discovery and development.
The Bioz cloud platform's software architecture taps advances in natural language processing and machine learning – two of the many techniques to analyze big data – to mine and structure hundreds of millions of pages of scientific papers, placing summarized scientific findings at the fingertips of users. The platform – already in use by more than 30,000 researchers across more than 1,000 universities and biopharma companies, according to the company – can be employed to help scientists select products, plan studies and speed experimentation.
Last month, Biogen Inc. selected cloud technology developed by Medidata Solutions Inc. to support its clinical trials. The agreement enables the Cambridge, Mass.-based biopharma to deploy the use of Medidata Rave – a platform for capturing, managing and reporting patient data – and Medidata TSDV, designed to improve the efficiency of data verification, across its development portfolio.
In June, Median Technologies SA, of Sophia Antipolis, France, inked a global deal to install its imaging biomarker phenotyping system, known as IBIOPS, on the Microsoft Azure cloud computing platform to improve the processing and analysis of medical images. In disclosing the deal, Median officials said that precision medicine is all about transforming the use of diagnostic and biological data to pinpoint and deliver "preventive, targeted and effective" care. Although the Microsoft partnership, at its heart, was more closely aligned with diagnostics than therapeutics, the extraction of disease biomarkers from medical images also will serve to inform clinicians about treatment response. Ultimately, the technology could be used to assess surrogate endpoints during cancer drug development, according to the companies.
Microsoft also has a deal with Wuxi Nextcode, the genomic information unit of Wuxi Apptec, to place its genomics platform on Azure and facilitate the sharing of large-scale genomic data among researchers and clinicians. Placing genome sequences in Azure allows teams of scientists to link and query tens of thousands of genomes from institutions around the world to search for undiscovered links between genes and diseases.
Ability to interrogate data called 'mission critical'
Big data analytics also is permeating the regulatory side of drug development. In June, Molecular Health inked a license to provide the FDA with use of its SafetyMAP software, designed to improve the detection and molecular analysis of drug-induced adverse events for marketed drugs and the prediction of safety issues in drug candidates. The agency evaluated the technology during a five-year research collaboration with the Cambridge, Mass.-based company.
In describing its need for such a system as part of its Request for Quotation, the FDA pointed out that "drug safety prediction and the evaluation of postmarketing signals depend on the ability to find scientific data that can confirm relationships among drugs, drug targets, toxicity mechanism, patient susceptibility and clinical response. This requires the ability to interrogate a wide variety of data sources, including the [FDA Adverse Event Reporting System] database, MEDLINE, gene and protein databases, FDA drug product labels, patents and other document repositories to uncover hidden relationships between scientific findings and adverse events."
The agency called this capability "mission critical for assessing the importance of possible safety issues pre- and post-approval."
Those recent examples represent a drop in the big data bucket, but they offer an important backdrop for a new generation of biopharmas also looking to big data to propel their technologies and platforms, whether for in-house development or collaboration with partners.
The approaches are playing out across a striking array of endeavors. For example, San Francisco-based Verge Genomics Inc., formed last year by two scientists from the University of California, Los Angeles, David Geffen School of Medicine, is seeking to identify and design drugs for neurodegenerative diseases by using network algorithms to map the dozens or hundreds of genes that cause a given disease and then pinpointing the most promising drugs to target them. The company maintains that its platform to investigate whether FDA-approved drugs may have applications in neurodegenerative disease operates at one-thousandth the cost of conventional hunt-and-peck methods. (See BioWorld Today, Oct. 29, 2015.)
Initially, the company is seeking to collaborate with pharma partners to examine shelved compounds, including an initial alliance to investigate whether an undisclosed biopharma's assets could be applied to treat bipolar disease. Eventually, Verge wants to move into full-blown drug development, conducting proof-of-concept trials and studying combinations involving FDA-approved and/or novel compounds.
Twoxar Inc., a 2014 start-up, is exploiting big data and proprietary algorithms to uncover new drugs without using a wet lab, potentially shaving years off development timetables and slashing associated R&D costs. The computational drug discovery firm uses its cloud-based platform, called Duma, to draw insights from a variety of independent data sources. The platform can interrogate public and private datasets to identify and prioritize drug candidates against specific disease targets. The method is designed to predict whether or not a drug will be successful in preclinical and, ultimately, in clinical studies. (See BioWorld Today, Jan. 25, 2016.)
Meanwhile, Dundee University spinout Exscientia Ltd. last year showed that its automated medicinal chemistry technology cut the time and cost to design and optimize a drug by 75 percent while improving the quality of the resulting molecule. In 12 months, Exscientia delivered to partner Sumitomo Dainippon Pharmaceutical Co. Ltd., of Osaka, Japan, a pipeline-ready, bispecific, dual agonist compound that selectively activates G protein-coupled receptors from two distinct families. (See BioWorld Today, Sept. 2, 2015.)
Starting from product concept, Exscientia synthesized fewer than 400 compounds to shape a molecule that matched Sumitomo's development criteria, compared to several thousand compounds typically synthesized in a standard project. The partners worked together closely, with Sumitomo chemists conducting rapid synthesizing and assaying so that Exscientia could continually refine its algorithms and evolve the drug design. Exscientia also has alliances with Eli Lilly and Co. and Johnson & Johnson unit Janssen Pharmaceutical Co.
Biopharma 'rapidly coming up to speed' in appreciating big data
X-Chem Inc., founded in 2010 around a small-molecule discovery platform, is another that has embraced the potential of big data. Last month, the Waltham, Mass.-based company expanded a global drug discovery collaboration with Bayer AG, initiated in 2012, to encompass the entire bandwidth of therapeutic areas and target classes from the R&D pipeline at the Leverkusen, Germany-based pharma. (See BioWorld Today, July 13, 2016.)
The sweetened arrangement came after swift progress during a pilot deal between the companies in which Bayer licensed an early stage drug discovery program against an undisclosed epigenetic drug target and a second drug discovery program against a cardiovascular drug target.
And the Bayer deal wasn't a fluke. X-Chem has licensed more than 20 programs, according to Rick Wagner, co-founder and CEO. Just last week, the company inked another agreement with Abbvie Inc., of North Chicago, to collaborate on multiple drug targets in oncology and immunology.
X-Chem's Dex drug discovery platform is based on a library, currently numbering more than 120 billion compounds, generated by iteratively combining and synthesizing small molecules tethered to DNA tags that record the synthetic history of each. The library is screened as a mixture, using affinity-based binding to a target of interest. Molecules that bind to the target can be fished out, while the rest are washed away. X-Chem then uses DNA sequencing methods to detect molecules that are enriched when bound to the target.
Needless to say, big data underpins the discovery engine.
"We have one of the largest screening libraries of any company, and we have screened well over 200 separate targets internally, so the amount of data that we have classified is vast," Wagner told BioWorld Insight.
Although elegant informatics tools position biopharmas to exploit the power of big data, success depends on how the information is used, or "how we interrogate the database," as Wagner put it. "We see some fundamental advances coming out of that effort as we move forward in terms of using the database to machine learn and make predictions about new molecules that could be discovered."
Biopharma "is very rapidly coming up to speed in understanding how to utilize big data," he added.
Epibiome's Conley agreed.
"Being an engineering company, which is how we think of ourselves, requires the ability to measure something," he explained, starting with a metric that shows, definitively, whether successive changes improve a process or take it in the wrong direction.
"That metric, for us, is the bacterial community," Conley said.
Without right tools, 'just stabbing in the dark'
Epibiome integrated its expertise in microbiology, phage biology and next-gen sequencing into a single platform designed to develop phage therapies that can modify the microbiome in a targeted manner, enabling the deletion of one or more bacteria strains without affecting non-pathogenic strains. The South San Francisco-based firm is a graduate of the inaugural class of Illumina Accelerator, the genomic ecosystem created by the San Diego-based company to foster entrepreneurship in the genomics industry.
"Next-generation sequencing allows us to do profiling in a manner in which we can measure – interrogate – the bacterial community and prove that we're actually making specific deletions by applying our phages," Conley told BioWorld Insight.
Ten years ago – maybe even five – the type of computing power needed to run those processes was prohibitively expensive for a small biopharma; "yet without these tools, you're really just stabbing in the dark," he said.
A similar process, to a different end, is underway at Gigagen Inc. The preclinical biotherapeutics company combines a proprietary microfluidics platform, bioinformatics, next-gen sequencing and genetic engineering to capture the genetic make-up of the human immune repertoire and characterize B and T cells at the rate of millions per hour while simultaneously identifying their antigen and protein binding sites. The South San Francisco-based company maintains that its drug discovery platform eclipses conventional methods by several orders of magnitude.
To date, Gigagen has replicated the immune systems of 52 donors. The endgame is to seek FDA approval for injectable, personalized immunotherapy based on the most robust immune systems in nature.
"Finding one cell at a time in a test tube, like most places do right now, is not a very efficient form of drug discovery," Carter Keller, Gigagen's chief operating officer, told BioWorld Insight. "What we've developed is a platform that allows you to capture the entire immune system, recreate it and then give that whole immune system to a different patient or mine that immune system to learn what's causing an immune response."
Healthy adults have about 50 million antibodies in their blood, Keller said, and with Gigagen's ability to capture about 3 million B cells per hour "we can capture all of the antibodies out of a person's blood in a matter of days."
That process is the easy part. Only a tiny fraction of the cells is programmed to fight a given disease, Keller pointed out. Finding those and marshaling their power is a longer-term effort. Initially, the company is focused on creating recombinant gamma-globulin, or rIVIG, to treat primary immune deficiencies – an effort Gigagen has validated preclinically.
Big data 'a great equalizer'
Even further upstream, Vium Inc. is pursuing another approach that's dependent on the manipulation of big data. In June, the San Mateo, Calif.-based company emerged after raising approximately $33 million during three years undercover as Mousera Inc. During that time, co-founders Timothy Robertson, CEO, and Joe Betts-Lacroix, chief technology officer, were honing their living informatics platform for conducting preclinical in vivo drug research.
Vium applies life sciences, digital technology and large-scale interpretive technology to living systems – rodents – to help researchers make better decisions about prioritizing compounds for further testing. The company's Digital Vivarium features a fully automated physical and digital infrastructure containing rodent cages with intelligent sensors, video and a high-definition camera network. Information is captured and transmitted to the Vium Cloud, where the data are stored, computed and analyzed. Scientists can design and conduct studies through an online research suite, where they can monitor animal behavior and health 24/7, from anywhere in the world. Video and auditable records enable longer-term analyses and study reproduction.
The system was designed to minimize animal handling and reduce human error during preclinical testing – a process that had not changed significantly even as advances were occurring in biological models of disease. Building and experimenting with the Digital Vivarium gave Vium's founders an appreciation for the real impact of interactions between people and animals as well as the shortcomings of periodic observations rather than continuous monitoring. But automating the process was dependent on their ability to collect, sort and analyze millions of pieces of data.
"When you can capture a comprehensive record of everything that's happened in a study, you can go back retrospectively and understand the course of events that led to a particular outcome," Robertson explained.
The Digital Vivarium also enables the types of long-term animal studies that have been impractical because of the cost of monitoring or the inability to control data quality over time, due to the manual subjectivity associated with handling the animals.
"We can monitor the health of the animals in a very low-resources way," Robertson explained, essentially by going online and reviewing data from the battery of sensors in the company's Smart Housing, which contains the rodent cages. Automating the measurement process enables researchers to look at animal studies in a new light, since the Digital Vivarium removes many of the resource-intensive barriers that limited testing to a few compounds at the long end of preclinical development.
"We have collaborations where we're testing 25 drugs in parallel, very rapidly honing in on the most efficacious drugs and getting data about the safety of those drugs at the same time because we can monitor parameters like respiratory rate," Robertson told BioWorld Insight.
Big pharma might not be buying into big data in a big way – not yet, at least – but companies are paying attention, he added.
"We're getting a large demand to apply the technology, and it's coming from across the board," Robertson said. "Big pharma is very interested. They tend to move more slowly, but we're actually seeing some companies move quickly. We're also getting a lot of interest from smaller biopharma companies because this is a great equalizer for them."
A common theme among companies deploying big data for drug discovery and development is the need for diversity within the analytical team. Vium's staff includes in vivo researchers, data scientists, informatics systems specialists and consumer electronics experts. The Epibiome team includes a large animal veterinarian and scientists encompassing "all of the major disciplines," Conley said, citing chemistry, physics, biology and medicine.
"What I see in many of these start-ups is one founder and four grad school colleagues who worked in the same lab and have completely overlapping skill sets," Conley observed. "We've benefited tremendously from having completely orthogonal skill sets, in many ways. It's challenging, because we all speak a different language, but when it comes to solving problems it's a beautiful thing to apply these different tool kits."