Posts by Collection

portfolio

publications

talks

Statistical and computational methods for the meta-analysis and resemblance analysis of transcriptomic studies

Published:

Advancement in high-throughput technologies has generated a large amount of “-omics” data that become an inevitable component of modern biomedical and public health research. Practical statistical and computational methods are needed to meta-analyze and compare “-omics” data from different studies or experiments. In this talk, I will introduce two problem-driven methods and one software for the meta-analysis and resemblance analysis of multiple transcriptomic studies. In the first part, we proposed a Bayesian hierarchical model for RNA-seq meta-analysis by modeling count data, integrating information across genes and across studies, and modeling differential signals across studies via latent variables. In the second part, as motivated by two PNAS papers presenting contradicting conclusions of mouse model resemblance to human studies, we proposed a novel method to quantify the continuous measure of resemblance across model organisms and characterize in what pathways they most agree or disagree. In addition, I will also briefly introduce a R-Shiny based modularized software suite called “MetaOmics” to meta-analyze multiple transcriptomic studies for seven biological purposes.

High-dimensional variable screening: from single study to multiple studies

Published:

Advancement in technology has generated abundant high-dimensional data from many studies. Due to huge computational advantage, variable screening methods based on marginal association have become promising alternatives to the popular regularization methods. However, all screening methods are limited to single study so far. We consider a general framework for variable screening with multiple related studies, and further propose a novel two-step screening procedure for high-dimensional regression analysis under this framework. Compared to the one-step procedure, our procedure greatly reduces false negative errors while keeping a low false positive rate. Theoretically, we show that our procedure possesses the sure screening property with weaker assumptions on signal strengths and allows the number of features to grow at an exponential rate of the sample size. Post screening, the dimension is greatly reduced so common regularization methods such as group lasso can be applied to identify the final set of variables. Under the same framework, we also extend the screening procedure to Cox proportional hazards model to detect survival-associated biomarkers from multiple studies, while allowing censoring proportions and baseline hazard rates to vary across studies. Simulations and application to cancer transcriptomic data has illustrated the advantage of our proposed methods.

Congruence evaluation for model organisms in transcriptomic response

Published:

Model organisms are instrumental substitute for human studies to expedite basic and clinical research. Despite their indispensable role in mechanistic investigation and drug development, resemblance of animal models to human has long been questioned and debated. Little effort has been made for an objective and quantitative congruence evaluation system for model organisms. We hereby propose a framework, namely Congruence Analysis for Model Organisms (CAMO), for transcriptomic response analysis by developing threshold-free differential expression analysis, quantitative resemblance score controlling data variabilities, pathway-centric downstream investigation and knowledge retrieval by text mining. Instead of a genome-wide dichotomous answer of “poorly/greatly” mimicking, CAMO assists researchers to quantify and visually identify biological functions that are best or least mimicked by model organisms, providing foundations for hypothesis generation and subsequent translational decisions.

teaching