Projects

BioMedical Evidence Graph (BMEG)

The BioMedical Evidence Graph (BMEG) integrates different types of biomedical data into a unified graph for efficient application of machine learning and discovery algorithms across heterogeneous data types. BMEG will leverage the petabytes of genomics data available for tumor samples from repositories like the National Cancer Institute’s Genomic Data Commons to predict drug sensitivity, patient outcomes, and other clinically relevant phenotypes.                                                           

 

The BMEG data model is instantiated in a scalable graph database optimized for storing and querying graphs containing terabytes of vertices and edges distributed across a multi-machine cluster. This graph is the store of record for the BMEG. It maintains the connections between projects, donors, samples, molecular data and treatment evidence and assures that these entities are associated correctly.

____________________________________________________________________________________________________

Galaxy

Galaxy is a scientific analysis workbench used by thousands of scientists worldwide to analyze genomic, proteomic, imaging, and other large biomedical datasets. Galaxy’s user-friendly, web-based interface makes it possible for anyone, regardless of their informatics expertise, to create, run, and share large-scale robust and reproducible analyses. Galaxy accelerates biomedical research by bringing together tool developers and end users such as bench scientists and physician-researchers. There are more than 5,000 analysis tools available in Galaxy’s ToolShed, and users run more than 200,000 analyses each month on Galaxy’s main public server. OHSU’s precision cancer medicine programs use Galaxy to run clinical and research genomics analyses as well as machine learning workflows. Galaxy is funded by both NIH and NSF.

[Code]

This is a joint project in collaboration with the Nekrutenko Lab and Taylor Lab additional information

___________________________________________________________________________________________________

Rail

Rail-RNA is software for analysis of RNA sequencing (RNA-seq) data. Its distinguishing features are

* **Scalability**. Built on MapReduce, the software scales to analyze hundreds of RNA-seq samples at the same time.

* **Reduced redundancy**. The software identifies and eliminates redundant alignment work, making the end-to-end analysis time per sample *decrease* for fixed computer cluster size as the number of samples increases.

* **Integrative analysis**. The software borrows strength across replicates to achieve more accurate splice junction detection, especially in genomic regions with low coverage.

* **Mode agnosticism**. The software integrates its own parallel abstraction layer that allows it to be run in various distributed computing environments, including the Amazon Web Services (AWS) Elastic MapReduce (EMR) service, or any distributed environment supported by Python, including clusters using batch schedulers like PBS or SGE, Message Passing Interface (MPI), or any cluster with a shared filesystem and mutual SSH access. Alternately, Rail-RNA can be run on a single multi-core computer, without the aid of a batch system or MapReduce implementation.

* **Inexpensive cloud implementation**. An EMR run on > ~100 samples costs ~ $1/sample with spot instances.

* **Secure analysis of dbGaP-protected data on EMR**. See this guide for information on setup.

Together with collaborators at Johns Hopkins University, we have used Rail-RNA to reanalyze over 70,000 human RNA-seq samples so far, including publicly available samples on the Sequence Read Archive (SRA) as well as controlled-access samples from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Expression information across these samples at the gene, exon, and exon-exon junction levels are collected into the resource recount2 , which has an accompanying R/Bioconductor package.

additional information

__________________________________________________________________________________________________

PRECEPTS

Predictors of Cellular Phenotypes to guide Therapeutic Strategies (PRECEPTS) are a set of related analytical packages which can be used in tandem to identify transcriptional programs downstream of cancer driving events from coordinated genomic and expression profiles. The six driving events and types we seek to identify and predict are: 

* **Recurrent mutations in cancer driving genes**

* **Mutually exclusive modules of cancer driving genes**

* **Transcription factor activity**

* **Mutually inhibiting transcription factors**

* **Network enrichment and decoupling**

* **Drug sensitivity**

PRECEPTS is currently in early development.

___________________________________________________________________________________________________

Pathway Commons

Pathway Commons is a collection of publicly available pathway information from multiple organisms. It provides researchers with convenient access to a comprehensive collection of biological pathways from multiple sources represented in a common language for gene and metabolic pathway analysis. Access is via a web portal for query and download. Database providers can share their pathway data via a common repository and avoid duplication of effort and reduce software development costs. Bioinformatics software developers can increase efficiency by sharing pathway analysis software components. Pathways can include biochemical reactions, complex assembly, transport and catalysis events, physical interactions involving proteins, DNA, RNA, small molecules and complexes, gene regulation events and genetic interactions involving genes.

___________________________________________________________________________________________________

Quantitative Image Analysis for multiplex IHC (and cyclic IF)

We successfully developed multiplexed immunohistochemistry (IHC) technology which allows evaluation of multiple protein biomarkers in a single FFPE tissue section and demonstrated that immune complexity stratifies response to vaccination therapy in PDAC. However, interpretation of the serial images output by the multiplex IHC method entails several challenges. This project aims to refine and rigorously validate our technologies as well as develop enhanced analytical capabilities addressing current limitations.

___________________________________________________________________________________________________

Precision Cancer Medicine Informatics

We are developing data analysis methods and data management software to store, analyze, and integrate clinical, imaging, and molecular data for (1) treating cancer using precision therapies adapted over time; and (2) discovering and understanding mechanisms of resistance in cancer. This initiative brings together and advances many areas, including (a) development of computational analysis workflows to identify key biomarkers such as somatic mutations, gene expression, pathway activity, and tumor composition; (b) using public datasets in genomics, transcriptomics, and biological pathways together with patient data to correlate biomarkers with prognosis and predict therapeutic response; and (c) producing patient reports and interactive visualizations that provide precision therapy recommendations based on consensus amongst methods and enable differential analysis across timepoints. Key software used in this work includes LabKey for data management and visualization, G2P for finding key biological and clinically actionable biomarkers, and Galaxy for analysis workflow creation and execution.
additional information

___________________________________________________________________________________________________

Genotype-to-Phenotype Database (G2P)

G2P is an aggregate public clinical cancer knowledge base for storing and searching connections between genomic biomarkers (“genotypes”) and patient diagnosis, prognosis, and response to treatment (“phenotypes”). Key uses of G2P include (a) searching by somatic variant to find drugs known to lead to response or resistance in tumors with the variant; (b) searching by drug to identify different mutations in which it can lead to response; (c) searching clinical trials to find those associated with particular biomarkers or drugs. G2P combines biomarker-phenotype associations from 9 trusted and curated knowledge bases, including CIViC, OncoKB, PMKB, JAX CKB and the Cancer Genome Interpreter. Clinical trials data is also included from several sources as well. Users can perform full-text search on G2P and filter results using a web portal with intuitive visualizations. Code