Exploring Machine Learning Techniques to Improve Peptide Identification

35 views

Wednesday, November 14, 2018 - 03:00 pm

Meeting room 2265, Innovation Center

THESIS DEFENSE Department of Computer Science and Engineering University of South Carolina Author : Fawad Kirmani Advisor : Dr. John Rose Date : Nov 14th , 2018 Time : 3:00 pm Place : Meeting room 2265, Innovation Center Abstract The goal of this work is to improve proteotypic peptide prediction with lower processing time and better efficiency. Proteotypic peptides are the peptides in protein sequence that can be confidently observed by mass-spectrometry based proteomics. One of the widely used method for identifying peptides is tandem mass spectrometry (MS/MS). The peptides that need to be identified are compared with the accurate mass and elution time (AMT) tag database. The AMT tag database helps in reducing the processing time and increases the accuracy of the identified peptides. Prediction of proteotypic peptides has seen a rapid improvement in recent years for AMT studies for peptides using amino acid properties like charge, code, solubility and hydropathy. We describe the improved version of a support vector machine (SVM) classifier that has achieved similar classification sensitivity, specificity and AUC on Yersinia Pestis, Saccharomyces cerevisiae and Bacillus subtilis str. 168 datasets as was described by Web-Robertson et al. [13] and Ahmed Alqurri [10]. The improved version of the SVM classifier uses the C++ SVM library instead of the MATLAB built in library. We describe how we achieved these similar results with much lesser processing time. Furthermore, we tested four machine learning classifiers on Yersinia Pestis, Saccharomyces cerevisiae and Bacillus subtilis str. 168 data. We performed feature selection from scratch, using four different algorithms to achieve better results from the different machine learning algorithms. Some of these classifiers gave similar or better results than the SVM classifiers with fewer features. We describe the results of these four classifiers with different feature sets.

Jobs Board