- This event has passed.
Seminar on Classification and Clustering for RNA-seq data with variable selection
June 12, 2023 @ 12:00 pm - 1:00 pm
In the first part of the presentation, we develop a negative binomial mixture model with lasso or fused lasso gene regularization to cluster samples (small n) with high-dimensional gene features (large p). A modified EM algorithm and Bayesian information criterion are used for inference and determining tuning parameters. The method is compared with existing methods using extensive simulations and two real transcriptomic applications in rat brain and breast cancer studies. The result shows the superior performance of the proposed count data model in clustering accuracy, feature selection, and biological interpretation in pathways.
In the second part of this presentation, we will discuss a classification model based on negative binomial distribution via generalized linear model framework with double regularization for gene and covariate sparsity to accommodate three key elements: adequate modeling of count data with overdispersion, gene selection and adjustment for covariate effect. The proposed method is evaluated in simulations and two real applications using cervical tumor miRNA-seq data and schizophrenia post-mortem brain tissue RNA-seq data to demonstrate its superior performance in prediction accuracy and feature selection.