Novel variable screening methods for omics data integration


Sure screening are a series of simple and effective dimension reduction methods to reduce noise accumulation for variable selection in high-dimensional regression and classification problems. Since the first method proposed by Fan and Lv (2008), numerous sure screening methods have been developed for various model settings and showed their advantage for big data analysis with desired scalability and theoretical guarantees. However, none of the methods are directly applicable to reduce dimension and select variables in omics data integration problems. In this talk, I will introduce two novel variable screening methods recently developed in our group for both horizontal and vertical omics data integration. In the first project, we proposed a general framework and a two-step procedure to perform variable screening when combining the same type of omics data from multiple related studies and showed the inclusion of multiple studies provided more evidence to reduce dimension. In the second project, we developed a fast and robust variable screening method to detect epigenetic regulators of gene expression over the whole genome by combining epigenomic and transcriptomic data, where both predictor and response spaces are of high-dimension. We used extensive simulations and real data to demonstrate the strengths of our methods as compared to existing screening methods.