How to Break an API: How Community Values Influence Practices

Friday, January 25, 2019 - 11:00 am
Innovation Center, Room 2277
Speaker: Christian Kaestner Affiliation: Carnegie Mellon University Location: Innovation Center, Room 2277 Time: Jan 25 (11am-12) Abstract: Breaking the API of a package can create severe disruptions downstream, but package maintainers have flexibility in whether and how to perform a change. Through interviews and a survey, we found that developers within a community or platform often share cohesive practices (e.g., semver, backporting, synchronized releases), but that those practices differ from community to community, and that most developers are not aware of alternative strategies and practices, their tradeoffs, and why other communities adopt them. Most interestingly, it seems that often practices and community consensus seems to be driven by implicit values in each community, such as stability, rapid access, or ease to contribute. Understanding and discussing values openly can help to understand and resolve conflicts, such as discussions between demands for more stability and a pursuit of frequent and disruptive innovations. Bio: Christian Kästner is an associate professor in the School of Computer Science at Carnegie Mellon University. He received his PhD in 2010 from the University of Magdeburg, Germany, for his work on virtual separation of concerns. For his dissertation he received the prestigious GI Dissertation Award. His research interests include understanding collaboration in open source and correctness and understanding of systems with variability, including work on implementation mechanisms, tools, variability-aware analysis, type systems, feature interactions, empirical evaluations, and refactoring.

A Partially Automated Process for the Generation of Believable Human Behaviors

Friday, December 14, 2018 - 01:30 pm
Meeting room 2267, Innovation Center
DISSERTATION DEFENSE Department of Computer Science and Engineering University of South Carolina Author : Bridgette Parsons Advisor : Dr. Jose Vidal Date : Dec 14th, 2018 Time : 1:30 pm Place : Meeting room 2267, Innovation Center Abstract Modeling believable human behavior for use in simulations is a difficult task. It requires a great deal of time, and frequently requires coordination between members of different disciplines. In our research, we propose a method of partially automating the process, reducing the time it takes to create the model, and more easily allowing domain experts that are not programmers to adjust the models as necessary. Using Agent-Based modeling, we present MAGIC (Models Automatically Generated from Information Collected), an algorithm designed to automatically find points in the model's decision process that require interaction with other agents or with the simulation environment and create a decision graph that contains the agent's behavior pattern based upon raw data composed of time-sequential observations. We also present an alternative to the traditional Markov Decision Process that allows actions to be completed until a set condition is met, and a tool to allow domain experts to easily adjust the resulting models as needed. After testing the accuracy of our algorithm using synthetic data, we show the results of this process when it is used in a real-world simulation based upon a study of the medical administration process in hospitals conducted by the University of South Carolina's Healthcare Process Redesign Center. In the healthcare study, it was necessary for the nurses to follow a very consistent process. In order to show the ability to use our algorithm in a variety of situations, we create a video game and record players' movements. However, unlike the nursing simulation, the environment in the game simulation is more prone to changes that limit the appropriate set of actions taken by the humans being modeled. In order to account for the changes in the simulation, we present a simple method using the addition of a hierarchy of rules with our previous algorithm to limit the actions taken by the agent to ones that are appropriate for the current situation. In both the healthcare study and the video game, we find that there are multiple distinct patterns of behavior. As a single model would not accurately represent the behavior of all of the humans in the studies, we present a simple method of classifying the behavior of individuals using the decision graphs created by our algorithm. We then use our algorithm to create models for each cluster of behaviors, producing multiple models from one set of observational data. Believability is highly subjective. In our research, we present methods to partially automate the process of producing believable human agents, and test our results with real-world data using focus groups and a pseudo-Turing test. Our findings show that under the right conditions, it is possible to partially automate the modeling of human decision processes, but ultimately, believability is greatly dependent upon the similarity between the viewer and the humans being modeled.

A Reduced Description of Transient Stochastic Thermo-Fluid Systems

Monday, December 10, 2018 - 01:00 pm
Speaker: Hessam Babaee, Ph.D. Location: Innovation Center, Room 2277 Dec. 10, 13:00--14:00 Abstract:Highly convective thermo-fluid systems have a difficult phenomenon to predict: transient instabilities. While these instabilities have finite lifetimes, they can play a crucial role either by altering the system dynamics through the activation of other instabilities or by creating sudden nonlinear energy transfers that lead to extreme responses. However, their essentially transient character makes their description a particularly challenging task. We develop a minimization framework that focuses on the optimal approximation of the system dynamics in the neighbourhood of the system state. This minimization formulation results in differential equations that evolve a time-dependent basis so that it optimally approximates the most unstable directions. Several thermo-fluid demonstration cases will be presented that shows the performance of the presented method. Bio: Dr. Hessam Babaee is an expert in the area of hydrodynamic instability, uncertainty quantification, reduced-order modeling and high performance computing. He is currently a tenure-stream Assistant Professor in Swanson School of Engineering at University of Pittsburgh and a Research Scientist in Mechanical Engineering Department at Massachusetts Institute of Technology (MIT). Prior to joining University of Pittsburgh, he was a Postdoctoral Associate at MIT. He received his PhD in Mechanical Engineering and a Masters degree in Applied Mathematics from Louisiana State University both awarded in 2013.

A Flow Feature Detection Framework for Massive Computational Data Analytics

Friday, December 7, 2018 - 01:00 pm
Storey Innovation Center (Room 2277)
Dr. Yi Wang from the Department of Mechanical Engineering, University of South Carolina will give a talk on Friday Dec. 7th in the Storey Innovation Center (Room 2277) from 13:00 - 14:00. Abstract: In this seminar a framework based on the incremental proper orthogonal decomposition (iPOD) and the data mining method to perform integrated analysis on large-scale computational data will be presented for targeted data visualization, discovery, and learning. Four key components will be introduced, including (1) iPOD based on the mean value and the subspace updating method to incrementally reduce data dimensions, decouple the time-averaged and time-varying flow structures, and extract coherent structures and modes in massive Computational Fluid Dynamics (CFD) data; (2) data mining to classify the flow regions of similar dynamic characteristics and identify the candidate and global ROIs (GROIs) for focused analysis; (3) feature detection to capture flow features of interest and determine ultimate ROIs (UROIs); and (4) selective storage and targeted visualization of data in UROIs. Case studies on vortex and shock wave detection that are of significant interest to aerospace and defense applications will be presented to demonstrate the framework. Computational performance of the framework in terms of data volume, reduction ratio, resource usage, and storage requirements will also be discussed. Our quantitative results clearly show that iPOD is able to process large datasets that overwhelm the traditional batch POD leading to 4-16X data reduction in the temporal domain through spectral projection. By data mining 50% to 70% of the spatial domain with high probability of flow feature occurrence is identified as candidate GROIs for efficient, confined feature detection. Key features in the UROI consisting of only 2% to 30% of the original data are successfully captured by our feature detection algorithms, and can be selectively stored and visualized for targeted discovery and learning. In contrast to batch-POD, iPOD reduces physical memory usage by more than 10X and processing time by up to 75% and is far more appropriate for large data analytics. Biography: Yi Wang obtained his B.S. and M.S. in Machinery and Energy Engineering from Shanghai Jiao Tong University, P.R.China in 1998 and 2000, respectively; and his Ph.D. in Mechanical Engineering from the Carnegie Mellon University in 2005. Currently he is an associate professor of mechanical engineering and is the principal investigator (PI) of the Integrated Multiphysics & Systems Engineering Laboratory (iMSEL) at the University of South Carolina. He has served as a PI or a Co-PI on multiple DoD-, MDA-, NASA-, and NIH-funded projects to develop advanced methodologies and techniques in computational and data-enabled science and engineering (CDS&E), including reduced order modeling, data reduction, large-scale and/or real-time data analytics, hierarchical system-level simulation, and system engineering. The applications of these technologies span spacecraft and missile thermal analysis, aeroservoelasticity and aerothermoservoelasticity, massive computational data management, real-time flight load data processing, integrated multi-scale fluidics systems (design, fabrication, and experimentation) for environmental monitoring, biodefense, and regenerative medicine. He has coauthored 4 book chapters, and 80 journal and conference publications. He is also the co-inventor of 5 patents.

Guest Speakers: Scott McNealy and Bob Cooper

Monday, November 26, 2018 - 06:00 pm
Storey Innovation Center (Room 1400)
Nov 26th 6pm EST – 9pm EST M. Bert Storey Engineering and Innovation Center 550 Assembly St, Columbia, SC 29201 (Room 1400) 6:30pm: I don’t think that person requires introductions, but here it is. His name is Scott McNealy. Scott McNealy is an outspoken advocate for personal liberty, small government, and free-market competition. In 1982, he co-Founded Sun Microsystems and served as CEO and Chairman of the Board for 22 years. He piloted the company from startup to legendary Silicon Valley giant in computing infrastructure, network computing, and open source software. 7:30pm: Bob Cooper. CEO of local company Swampfox but with great history for example Bob was the CTO of Conita, a company focused on creating software based “Personal Virtual Assistants” – think Apple Siri but 15 years ago.

Scott McNealy (@ScottMcNealy)

Co-Founder, Former Chairman of the Board, and CEO, Sun Microsystems, Inc. Co-Founder, and Board Member, Curriki Co-Founder, and Executive Chairman of the Board, Wayin Board Member, San Jose Sharks Sports and Entertainment Scott McNealy is an outspoken advocate for personal liberty, small government, and free-market competition. In 1982, he co-Founded Sun Microsystems and served as CEO and Chairman of the Board for 22 years. He piloted the company from startup to legendary Silicon Valley giant in computing infrastructure, network computing, and open source software. Today McNealy is heavily involved in advisory roles for companies that range from startup stage to large corporations, including Curriki and Wayin. Curriki (curriculum + wiki) is an independent 501(c)(3) organization working toward eliminating the education divide by providing free K-12 curricula and collaboration tools through an open-source platform. Wayin, the Digital Campaign CMS platform enables marketers and agencies to deliver authentic interactive campaign experiences across all digital properties including web, social, mobile and partner channels. Wayin services more than 300 brands across 80 countries and 10 industries. Scott McNealy is an enthusiastic ice hockey fan, and an avid golfer with a single digit handicap. He resides in California with his wife, Susan, and they have 4 sons. BA, Harvard, 1976 MBA, Stanford, 1980

Bob Cooper, CEO

Bob Cooper is the CEO of Swampfox Technology. Swampfox specializes in Call Center automation and is the software automating many of the transactions that many of the largest cable, energy and heath care companies in the US. Prior to starting Swampfox, Bob was Chief Architect at Avaya and was in charge of their call center self-service platform offer including Voice/Experience Portal, ICR and OneX Speech. During his time at Avaya this self-service platform grew to become the #1 selling IVR platform in the US. Prior to this Bob was the CTO of Conita, a company focused on creating software based “Personal Virtual Assistants” – think Apple Siri but 15 years ago. Bob holds many patents in the area of computer architecture and voice user interface design. He received his undergraduate and graduate engineering degrees from the University of Florida and teaches as electrical engineering as needed at the University of South Carolina. He’s married and has four children.

Algorithms for Robot Coverage Under Movement and Sensing Constraints

Friday, November 16, 2018 - 03:00 pm
Meeting room 2265, Innovation Center
DISSERTATION DEFENSE Author : Jeremy Lewis Advisor : Dr. Jason O’Kane Date : Nov 16th , 2018 Time : 3:00 pm Place : Meeting room 2265, Innovation Center Abstract This thesis explores the problem of generating coverage paths---that is, paths that pass within some sensor footprint of every point in an environment--for mobile robots. It both considers models for which navigation is a solved problem but motions are constrained, as well for models in which navigation must be considered along with coverage planning because of the robot's unreliable sensing and movements. The motion contraint we adopt for the former is a common constraint, that of a Dubins vehicle. We extend previous work that solves this coverage problem as a traveling salesman problem (TSP) by introducing a practical heuristic algorithm to reduce runtime while maintaining near-optimal path length. Furthermore, we show that generating an optimal coverage path is NP-hard by reducing from the Exact Cover problem, which provides justification for our algorithm's conversion of Dubins coverage instances to TSP instances. Extensive experiments demonstrate that the algorithm does indeed produce path lengths comparable to optimal in significantly less time. In the second model, we consider the problem of coverage planning for a particular type of very simple mobile robot. The robot must be able to translate in a commanded direction (specified in a global reference frame), with bounded error on the motion direction, until reaching the environment boundary. The objective, for a given environment map, is to generate a sequence of motions that is guaranteed to cover as large a portion of that environment as possible, in spite of the severe limits on the robot's sensing and actuation abilities. We show how to model the knowledge available to this kind of robot about its own position within the environment, show how to compute the region whose coverage can be guaranteed for a given plan, and characterize regions whose coverage cannot be guaranteed by any plan. We also describe an algorithm that generates coverage plans for this robot, based on a search across a specially-constructed graph. Simulation results demonstrate the effectiveness of the approach.

Exploring Machine Learning Techniques to Improve Peptide Identification

Wednesday, November 14, 2018 - 03:00 pm
Meeting room 2265, Innovation Center
THESIS DEFENSE Department of Computer Science and Engineering University of South Carolina Author : Fawad Kirmani Advisor : Dr. John Rose Date : Nov 14th , 2018 Time : 3:00 pm Place : Meeting room 2265, Innovation Center Abstract The goal of this work is to improve proteotypic peptide prediction with lower processing time and better efficiency. Proteotypic peptides are the peptides in protein sequence that can be confidently observed by mass-spectrometry based proteomics. One of the widely used method for identifying peptides is tandem mass spectrometry (MS/MS). The peptides that need to be identified are compared with the accurate mass and elution time (AMT) tag database. The AMT tag database helps in reducing the processing time and increases the accuracy of the identified peptides. Prediction of proteotypic peptides has seen a rapid improvement in recent years for AMT studies for peptides using amino acid properties like charge, code, solubility and hydropathy. We describe the improved version of a support vector machine (SVM) classifier that has achieved similar classification sensitivity, specificity and AUC on Yersinia Pestis, Saccharomyces cerevisiae and Bacillus subtilis str. 168 datasets as was described by Web-Robertson et al. [13] and Ahmed Alqurri [10]. The improved version of the SVM classifier uses the C++ SVM library instead of the MATLAB built in library. We describe how we achieved these similar results with much lesser processing time. Furthermore, we tested four machine learning classifiers on Yersinia Pestis, Saccharomyces cerevisiae and Bacillus subtilis str. 168 data. We performed feature selection from scratch, using four different algorithms to achieve better results from the different machine learning algorithms. Some of these classifiers gave similar or better results than the SVM classifiers with fewer features. We describe the results of these four classifiers with different feature sets.

Phylogeny, Ancestral Genome, and Disease Diagnoses Models Constructions using Biological Data

Monday, November 12, 2018 - 12:00 pm
Meeting room 2267, Innovation Center
DISSERTATION DEFENSE Department of Computer Science and Engineering University of South Carolina Author : Bing Feng Advisor : Dr. Jijun Tang Date : Nov 12th , 2018 Time : 12:00 pm Place : Meeting room 2267, Innovation Center Abstract Studies of bioinformatics develop methods and software tools to analyze biological data and provide insight of the mechanisms of biological processes. Machine learning techniques have been widely used by researchers for disease prediction, disease diagnosis, and bio-marker identification. Using machine learning algorithms to diagnose diseases has a couple of advantages. Besides solely relying on the doctors’ experiences and stereotyped formulas, researchers could use learning algorithms to analyze sophisticated, high-dimensional and multimodal biomedical data, and construct prediction/classification models to make decisions even when some information was incomplete, unknown, or contradictory. In this study, we first build an automated computational pipeline to reconstruct phylogenies and ancestral genomes for two high-resolution real yeast whole genome datasets. We further compare the results with recent studies and publications show that we reconstruct very accurate and robust phylogenies and ancestors. We also identify and analyze conserved syntenic blocks among reconstructed ancestral genomes and present yeast species. Next, we analyzed the metabolic level dataset obtained from the positive mass spectrometry of human blood samples. We applied machine learning algorithms and feature selection algorithms to construct diagnosis models of Chronic kidney diseases (CKD). We also identify the most critical metabolite features and study the correlations among the metabolite features and the developments of CKD stages. The selected metabolite features provided insights into CKD early stage diagnosis, pathophysiological mechanisms, CKD treatments and medicine development. Finally, we use deep learning techniques to build accurate Down Syndrome (DS) prediction/screening models based on the analysis of newly introduced Illumina human genome genotyping array. We proposed a bi-stream convolutional neural network (CNN) architecture with nine layers and two merged CNN models, which took two input chromosome SNP maps in combination. We evaluated and compared the performances of our CNN DS predictions models with conventional machine learning algorithms and single-stream CNN models. We visualized the feature maps and trained filter weights from intermediate layers of our trained CNN model. We further discussed the advantages of our method and the underlying reasons for their performances differences.

WiC: Internship preparation, Resume reviews, and LinkedIn headshots

Tuesday, November 6, 2018 - 07:00 pm
Room 2277, IBM Innovation Center/Horizon 2
You are invited to join Women in Computing November event on Tuesday, Nov. 6. Pizza will be provided and everyone - all genders and majors is welcome! Topic: Professional Development When: Tuesday, November 6th, start from 7 pm Where: Room 2277, IBM Innovation Center/Horizon 2 (the building next to Strom Thurmond Fitness Center that has the IBM logo on the side). Main agenda: Internship preparation, Resume reviews, and LinkedIn headshots

Machine Learning Based Disease Gene Identification and MHC Immune Protein-peptide Binding Prediction

Monday, October 29, 2018 - 09:00 am
Meeting room 2267, Innovation Center
DISSERTATION DEFENSE Author : Zhonghao Liu Advisor : Dr. Jianjun Hu Date : Oct. 29th , 2018 Time : 9:00 am Place : Meeting room 2267, Innovation Center Abstract Machine learning and deep learning methods have been increasingly applied to solve challenging and important bioinformatics problems such as protein structure pre- diction, disease gene identification, and drug discovery. However, the performances of existing machine learning based predictive models are still not satisfactory. The question of how to exploit the specific properties of bioinformatics data and couple them with the unique capabilities of the learning algorithms remains elusive. In this dissertation, we propose advanced machine learning and deep learning algorithms to address two important problems: mislocation-related cancer gene identification and major histocompatibility complex-peptide binding affinity prediction. Our first contribution proposes a kernel-based logistic regression algorithm for identifying potential mislocation-related genes among known cancer genes. Our algorithm takes protein-protein interaction networks, gene expression data, and subcellular location gene ontology data as input, which is particularly lightweight comparing with existing methods. The experiment results demonstrate that our proposed pipeline has a good capability to identify mislocation-related cancer genes. Our second contribution addresses the modeling and prediction of human leukocyte antigen (HLA) peptide binding of human immune system. We present an allele-specific convolutional neural network model with one-hot encoding. With extensive evaluation over the standard IEDB datasets, it is shown that the performance of our model is better than all existing prediction models. To achieve further improvement, we propose a novel pan-specific model on peptide-HLA class I binding affinities prediction, which allows us to exploit all the training samples of different HLA alleles. Our sequence-based pan model is currently the only algorithm not using pseudo sequence encoding — a dominant structure-based encoding method in this area. The benchmark studies show that our method could achieve state-of-the-art performance. Our proposed model could be integrated into existing ensemble methods to improve their overall prediction capabilities on highly diverse MHC alleles. Finally, we present a LSTM-CNN deep learning model with attention mechanism for peptide-HLA class II binding affinities and binding cores prediction. Our model achieved very good performance and outperformed existing methods on half of tested alleles. With the help of attention mechanism, our model could directly output the peptide binding core based on attention weight without any additional post- or pre- processing.