
- This event has passed.
Full Title: | Construction of Ensembles by Exploiting the Richness of Feature Variables in High-Dimensional Data with Application in Protein Homology |
Speaker: | Dr. Jabed Tomal |
Assistant Professor Department of Computer and Mathematical Sciences The University of Toronto Scarborough, Canada. |
|
Date/Time: | Monday, May 29, 2017, 10 a.m. |
Venue: | ISRT seminar room |
High-dimensional data may contain complementary subsets of useful feature variables which could bevaluable in predicting a response. In this work, I have developed a predictionmodel which exploits the richness of information contained in the complementary subsets ofuseful feature variables in high-dimensional data. The proposed model – which is an aggregated collection of logistic regression models (LRM) – is called an ensemble, where each constituent LRM is fitted to a subset of feature variables. An algorithm is developed to cluster the feature variables into subsets in a way that the variables in a subset are good to put together in an LRM, and the variables in different subsets are good in separate LRMs. Each subset of variables is called a “phalanx”, and the resulting ensemble is called an “ensemble of phalanxes (EPX).” The strength of the ensembledepends on the algorithm’s ability to identify/output strong and diverse subsets of feature variables.
Homologous proteins are considered to havea common evolutionary origin, i.e., the bearers of homologous proteins share a common ancestor. To develop an evolutionary sequence of proteins, a scientist needs to predict their biological homogeneity. The proposed ensembleis applied to the protein homology data, obtained from the 2004 KDD cup competition,and used to predict biological homogeneity of proteins. In this application, the feature variables are various scores representing structural similarity and amino acid sequence identity of proteins. Theunderlying assumption, for model building,is that the structural similarity and amino acid sequence identityare predictive to proteins’ biological homogeneity.As the proportion of homologous proteins is rare, the prediction performances of theensemble are evaluated by checking its ability to rank rare homologous proteins ahead of the non-homologous proteins. While prediction performances of an EPX are competitive to contemporary state-of-the-art ensembles, a big leap of improvement in prediction performances is achieved by aggregating two diverse EPXs obtained from optimizing two complementary evaluation metrics.Here, the algorithm and complementary-metrics guaranteed increased strength and diversity, respectively, among the ensembles of phalanxes to aggregate. Importantly, the performances of the two aggregated EPXs are robust against individual EPX when one EPX is good for detecting close homologs and the other is good for detecting distant homologs. Using parallel computing, the proposed ensemble is shown computationally efficient as well.
The Institute of Statistical Research and Training (ISRT) is pleased to welcome Dr. Tahmina Akter as a Lecturer, who officially joined the institute on April 27, 2025. Dr. Akter recently ... [ Read More ]
On April 29, 2025, the Institute of Statistical Research and Training (ISRT), in collaboration with the International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), organized an educational field trip to ... [ Read More ]
ISRT is delighted to announce that four recent graduates from the ISRT family are the recipients of the prestigious awards from the International Society for Clinical Biostatistics (ISCB) for the ... [ Read More ]
On April 16, 2025, the ISRT Students’ Club organized a dynamic photography exhibition on the institute’s premises, celebrating the artistic talents of students and faculty from all academic cohorts—from first-year ... [ Read More ]
On February 25, 2025, third-year Applied Statistics students visited Gazipur Agricultural University as part of their AST301: Design and Analysis of Experiments II course. The trip provided insights into the practical applications ... [ Read More ]
Director,
Institute of Statistical Research
and Training (ISRT)
University of Dhaka
Dhaka 1000, Bangladesh