- This event has passed.
Full Title: | Construction of Ensembles by Exploiting the Richness of Feature Variables in High-Dimensional Data with Application in Protein Homology |
Speaker: | Dr. Jabed Tomal |
Assistant Professor Department of Computer and Mathematical Sciences The University of Toronto Scarborough, Canada. |
|
Date/Time: | Monday, May 29, 2017, 10 a.m. |
Venue: | ISRT seminar room |
High-dimensional data may contain complementary subsets of useful feature variables which could bevaluable in predicting a response. In this work, I have developed a predictionmodel which exploits the richness of information contained in the complementary subsets ofuseful feature variables in high-dimensional data. The proposed model – which is an aggregated collection of logistic regression models (LRM) – is called an ensemble, where each constituent LRM is fitted to a subset of feature variables. An algorithm is developed to cluster the feature variables into subsets in a way that the variables in a subset are good to put together in an LRM, and the variables in different subsets are good in separate LRMs. Each subset of variables is called a “phalanx”, and the resulting ensemble is called an “ensemble of phalanxes (EPX).” The strength of the ensembledepends on the algorithm’s ability to identify/output strong and diverse subsets of feature variables.
Homologous proteins are considered to havea common evolutionary origin, i.e., the bearers of homologous proteins share a common ancestor. To develop an evolutionary sequence of proteins, a scientist needs to predict their biological homogeneity. The proposed ensembleis applied to the protein homology data, obtained from the 2004 KDD cup competition,and used to predict biological homogeneity of proteins. In this application, the feature variables are various scores representing structural similarity and amino acid sequence identity of proteins. Theunderlying assumption, for model building,is that the structural similarity and amino acid sequence identityare predictive to proteins’ biological homogeneity.As the proportion of homologous proteins is rare, the prediction performances of theensemble are evaluated by checking its ability to rank rare homologous proteins ahead of the non-homologous proteins. While prediction performances of an EPX are competitive to contemporary state-of-the-art ensembles, a big leap of improvement in prediction performances is achieved by aggregating two diverse EPXs obtained from optimizing two complementary evaluation metrics.Here, the algorithm and complementary-metrics guaranteed increased strength and diversity, respectively, among the ensembles of phalanxes to aggregate. Importantly, the performances of the two aggregated EPXs are robust against individual EPX when one EPX is good for detecting close homologs and the other is good for detecting distant homologs. Using parallel computing, the proposed ensemble is shown computationally efficient as well.
The Institute of Statistical Research and Training (ISRT) is proud to announce the successful completion of Phase 2 of the Student-Led Flood Relief Initiative. Our dedicated students have distributed 600 ... [ Read More ]
The Institute of Statistical Research and Training (ISRT) is proud to showcase the dedication of our students as they prepare and pack relief items for those affected by the recent ... [ Read More ]
As the flood situation in Bangladesh continues to worsen with each passing hour, a group of dedicated students from the Institute of Statistical Research and Training (ISRT) at the University ... [ Read More ]
The Institute of Statistical Research and Training (ISRT) gave a small farewell to Md. Abdul Basit as he embarks on his journey to pursue higher studies. On July 30, 2023, ... [ Read More ]
ISRT is pleased to welcome two new faculty members Tarikul Islam and Mahnaz Ibrahim (both from Batch 24). They were students of Applied Statistics and pursued their B.S. and M.S. degrees ... [ Read More ]
Director,
Institute of Statistical Research
and Training (ISRT)
University of Dhaka
Dhaka 1000, Bangladesh