New data systems arise in service of precision oncology
by Andy Koopmans
National Cancer Institute scientists recently released to the public the largest-ever database of sequenced cancer-related genetic variations. Known as the NCI-60 cell line collection, the database contains gene variants of type of cells responsible for nine different cancer types: ovary, colon, breast, lung, kidney, brain, leukemia and melanoma.
Making this database available to clinicians and investigators represents a major step toward the development of targeted, precision oncology and is expected to help speed the discovery of new drugs and better match patients with therapies.
Fred Hutchinson Cancer Research Center is also currently at work on its own landmark database undertaking in the service of precision oncology: the Hutch Integrated Data Repository and Archive, or HIDRA.
HIDRA is a collaborative effort across all Fred Hutchinson/University of Washington Cancer Consortium partner institutions to collect and integrate all pertinent data about clinical subjects into a robust, secure and extensible database. The goal is to enable physicians and investigators to learn from every patient served by any of the Cancer Consortium member institutions and to integrate that knowledge back into clinical care and research to benefit patients.
“Historically we treated cancers by the part of the body affected but now we’re classifying cancer molecularly,” said Paul Fearn, Biomedical Informatics lead at Fred Hutch. “There could be several cancers related to one genetic variation. Now, imagine being able to look across every single instance of an expressed gene in every single person who’s come through the door. That would be the holy grail of precision oncology. That’s what HIDRA is intended to be.”
HIDRA will be able to match patients to care pathways and clinical trials that offer the best treatment options, allow investigators to ask previously difficult-to-answer questions and make unexpected connections from seemingly trivial case notes that might be otherwise overlooked.
Additionally, HIDRA is intended to help automate the collection of long-term follow-up care data to track patients’ health once they leave treatment. “The idea is that nobody and no important information falls through the cracks,” Fearn said.
HIDRA’s four-tiered plan
HIDRA is a four-tiered project, with each goal-specific tier rolling out over a five-year timeline.
- Tier 1, begun in early FY ‘13, will create a secure and extensible data “pipeline” to collect, move and store approximately 150,000 historical Cancer Consortium clinical records in their various formats (for example, forms, doctors’ notes, biosampling data), with records from up to 5,000 to 6,000 new patients to be added each year. The data collected will include patient stories and clinical histories, related samples and biospecimen data, information regarding trials and studies in which the patient has participated, and genomic assay data.
- Tier 2, currently in the design phase, will implement natural language processing algorithms to extract important data elements from electronic medical records and compile them into uniform, searchable data while minimizing human effort and potential error or variability.
- Tier 3 will tackle enterprise-wide tracking of consent and study information so that patient data confidentiality is maintained when integrating assays and other data from research protocols.
- Tier 4 will make it possible to index and search across both clinical and research data using HIDRA and access any relevant related data (such as all instances of a cancer-causing genetic variation) from across the entire enterprise.
Unique and powerful
HIDRA is intended to be more than just a data warehouse. Several features will make it a unique and powerful resource for Cancer Consortium investigators.
- HIDRA will be based on open-source, freely-licensed software that allows the entire biomedical community to adopt and improve it. This means that it will be affordable to everyone who wants to use it, even those who study rare cancers who may not have the funding for proprietary or custom software.
- HIDRA will be secure. The current scope of the project is limited to Cancer Consortium partner data, and it will be built to withstand audit in compliance with the Federal Internet Security Management Act.
- HIDRA will save time and money. Use of natural language processing will allow HIDRA to automate the cumbersome tasks of manual data abstraction and processing of medical records, which represents a large cost savings, as the majority of clinical research budgets typically go toward these processes when done manually.
- HIDRA will provide automation for tracking of patients for long-term follow-up after they receive treatment, which can be a prohibitively expensive, labor-intensive task if done without automation. For example, periodic emails can be sent to former patients pointing them to a web form where they can enter data about their current state of health and outcomes after cancer treatment. This information can then be used to enlighten current treatments for others as well as maintain a better clinical relationship with those patients.
“We aim to be the top cancer center in the country in terms of our ability to leverage all of the data generated in clinics and laboratories across the Cancer Consortium, and I think with this team and strategy, we can get there,” Fearn said.