AI-ification

Friday, September 13, 2024 - 10:00 am
Held in person on the second Friday of each month at the AI Institute 1112 Greene St. Columbia, SC 29208

AI-ification: Present your research that can benefit from modern AI approaches to a panel of friendly and knowledgeable AI practitioners during the first hour of this meeting. During the second hour of the meeting, the panel will brainstorm and recommend ways of integrating modern AI techniques into your existing research. Form new collaborations and partnerships during the brainstorming session, take the formed ideas to AI-athon, and embark on your path to Deep AI-ification.
Registration form: https://forms.office.com/r/n5dMWBFCXT

Details at https://research.cec.sc.edu/aii/ai-ification

AI-Rountable: Generative AI, what they are, how they work, and how to use them?

Friday, September 6, 2024 - 10:00 am
Held online and in person at the AI Institute 1112 Greene St. Columbia, SC 29208

Roundtable Discussion: Join us in a 2-hour meeting when an AI-related topic (suggested by the USC community) is presented by a panel of experts (during the first hour) and discussed by the broader community of participants and experts (during the second hour). The topics will be suggested by the participants and selected based on popularity. 
Registration form: https://forms.office.com/r/n4UznxruNg

More details at https://research.cec.sc.edu/aii/roundtable-discussion

Women in Computing First Meeting

Thursday, August 29, 2024 - 07:30 pm
Honors Residence Hall B110

Hope you have had a wonderful start of the semester. Women in Computing will be hosting its first meeting of the Fall semester 6 – 7:30pm, Thursday August 29, in Honors Residence Hall B110! Women in Computing is open to all majors and students interesting in topics of computing technology, and diversity/inclusion within the tech industry. Everyone – all genders and majors is welcome!

Using Machine Learning and Deep Learning Algorithms for Low Birthweight Prediction

Monday, August 26, 2024 - 09:00 am

Author : Yang Ren
Advisor : Dr. Dezhi Wu, IIT Dept. & Dr. Yan Tong, CSE Dept
Date : Aug 26th
Time:  9:00 am – 11: 00 am
Place : Teams

Link: https://teams.microsoft.com/dl/launcher/launcher.html?url=%2F_%23%2Fl%2…

 

Dial in by phone

+1 803-400-6044,,897438708# United States, Columbia
Find a local number
Phone conference ID: 897 438 708#

Abstract

          Low Birthweight (LBW) is a major public health issue resulting in increased neonatal mortality and long-term health complications. Traditional LBW analysis methods, focusing on incidence rates and risk factors through statistical models, often struggle with complex unseen data, and thus their effectiveness is limited in early prevention of LBW. As such, more advanced LBW prediction models are needed, so this dissertation delves into this important research area through proposing and examining novel machine learning (ML) and deep learning (DL) algorithms to more accurately predict LBW during the early stage of birthing individuals’ pregnancy period. This dissertation consists of three studies, which covers the following three major research topics.  

     The first topic focuses on the examination of the effectiveness and impact of various data rebalancing techniques for LBW prediction to solve extremely imbalanced data issues. Through this investigation, we established a foundational pipeline for LBW prediction, paving the way for further development and refinement in subsequent studies. This first study also included an extensive feature importance analysis to identify key factors in LBW classification, crucial to guiding targeted interventions to improve birth outcomes.
     The second topic aims to develop a novel longitudinal transformer-based LBW prediction framework, which integrates prenatal mothers’ historical health records and current pre-delivery data, making it possible to provide more comprehensive and relevant input features for LBW prediction. This framework’s ability to effectively process and analyze these diverse data inputs marks a more significant advancement than previous approaches that primarily focus on immediate pre-delivery factors. As a result, this enhanced model is proved to improve the accuracy of LBW predictions, and thus offering a more robust tool for more effective early intervention strategies.
     The third topic is to propose and examine a pioneering fusion framework that combines structured medical records with rich text-based data. This large language model (LLM)-based approach aims to explore and optimize the strengths of both quantitative and qualitative data sources, for enhancing the predictive accuracy and explainability of the LBW prediction models. By integrating diverse data types, this proposed method is expected to offer in-depth insights into the myriad factors contributing to LBW, potentially unveiling previously unrecognized and more granular risk factors to refine the prediction models further.
     In summary, this dissertation presents a comprehensive exploration of using advanced ML and DL algorithms in the prediction of LBW through a series of three studies. From establishing LBW prediction pipeline with rebalancing strategies (Study 1), developing a transformer-based approach (Study 2) to introducing a tabular-text fusion framework (Study 3), this research will contribute to a substantial advancement in prenatal care. By enabling earlier and more accurate identification of LBW risks, this work has the potential to transform prenatal intervention strategies, leading to improved health outcomes for both mothers and their infants.

Efficient Machine Learning on Scientific Data Using Bayesian Optimization

Monday, July 15, 2024 - 09:00 am
online

DISSERTATION DEFENSE

Author : Rui Xin

Advisor : Dr. Jijun Tang

Date : July 15, 2024

Time:  9:00 am – 11: 00 am

Place : Zoom

Link:https://zoom.us/j/94479902244?pwd=8XbYQPbZaxXXeBt4e1r5gqrBy6upb4.1

Meeting ID: 944 7990 2244

Passcode: 126908


Abstract

    Deep Learning is pivotal in advancing data analysis across various scientific fields, from genomics to materials discovery. Despite its widespread use, efficiently learning from limited data and operating under resource constraints remains a significant challenge, often limiting its full potential in environments where data is scarce or resources are restricted. This dissertation explores Active Learning and Automated Machine Learning (AutoML) powered by Bayesian Optimization to enhance the efficiency of machine learning across multiple disciplines. It focuses on algorithm optimization and data management through three interconnected studies.

In the first study, we investigate how data management technique - active learning helps discover new materials with target properties in limited dataset considering the vast chemical design space. We propose an active generative inverse design method that combines active learning with a deep autoencoder neural network and a generative adversarial deep neural network model to discover new materials with a target property in the whole chemical design space. Our experiments demonstrate that although active learning may select chemically infeasible candidates, these samples are beneficial for training robust screening models. These models effectively filter and identify materials with desired properties from those generated hypothetically by the generative model. The results confirm the success of our active generative inverse design approach.

In the second study, we explore cancer heterogeneity and specificity through the analysis of mutational signatures, using collinearity analysis and machine learning techniques. These techniques include either a decision tree-based ensemble model or a flexible neural network-based method with automated hyperparameter optimization, each customizing a neural network for individual sub-tasks. Through thorough training and independent validation, our results reveal that although the majority of mutational signatures are distinct, similarities between certain mutational signature pairs are observed through both mutation patterns and mutational signature abundance. These observations can potentially assist in determining the etiology of still elusive mutational signatures. Further analysis using machine learning approaches indicates specific mutational signature relevance to cancer types, with skin cancer showing the strongest specificity among all cancer types.

Finally, we analyze cancer heterogeneity by examining immune cell compositions in tumor microenvironments, using neural architecture search to develop tailored models for classification subtasks. By analyzing transcriptome profiles from 11,274 patients across 33 cancer types to identify 22 immune cell types, we employ deep learning to model outcomes for cancer type and tumor-normal distinctions, utilizing the Shannon index for immune cell diversity and Cox regression for prognostic evaluations. Our findings reveal significant immune cell differences between tumors and normal tissues, with some discrepancies in directional differences across cancers. Immune cell composition patterns modestly differentiate cancer types, with sixteen significant prognostic associations identified, such as in kidney renal clear cell carcinoma. Additionally, immune cell diversity shows marked differences in seven cancer types and correlates positively with survival in some cases, underscoring the lack of a universal standard across all cancers.

Multi-scale Deep Representation Learning in Synthetic Biology

Tuesday, May 7, 2024 - 09:00 am
Online

DISSERTATION DEFENSE


Author : Xiaoyi Liu

Advisor : Dr. Jijun Tang and Dr. Yan Tong

Date : May 07, 2024

Time:  9 am – 11: 00 pm

Place : Teams

Link: https://teams.microsoft.com/l/meetup-join/19%3ameeting_NTFlYTIwM2EtMjc3…

Meeting ID: 236 573 306 493

Passcode: UTL2Gs


Abstract

 

Synthetic biology advances and combines the expertise of engineers and biologists, bridging the gap between engineering and natural life. Synthetic biology has been generally categorized into two broad branches by developing new biological components, networks, and systems to reprogram organisms. The first branch involves using synthetic molecules to mimic natural biological functions. The second branch focuses on assembling natural biological components in novel ways, aiming to produce systems with unique, practical functions. Thus, the de novo engineering of biological modules and synthetic pathways is used in related practical bioengineering applications, such as drug-targeting strategies and microbial product manufacturing. Therefore, synthetic biology represents a new paradigm in scientific exploration and innovation, with widely used implications for our understanding and optimization of biological systems.

Over the past decades, there has been a significant increase in the amount of available whole-genome sequencing data and experimental data due to the emergence of new automation technologies, such as high-content imaging, high-throughput screening, and sequencing. Given the growth of these data sets, researchers are unable to summarize these data simply from experience and memory. Thus, stable and efficient computational methods are required to integrate them to predict or reveal new phenomena or insights that have never been discovered. However, incomplete knowledge of metabolic processes impairs the accuracy of biological systems, hindering advancements in systems biology and metabolic engineering. Additionally, some fundamental challenges still remain. Firstly, problems in systems biology are often cross-scale and multi-modal, yet existing computational methods for problem definition and model design are often single-scale and single-modal. Secondly, biological systems are multi-scale, unbalanced, and noisy, making structuring and benchmarking this complicated data very difficult. Thirdly, most natural or valuable products' complete biosynthetic pathways are unknown. Thus, computer-aided biosynthesis planning holds significant value.

To address the above challenges, we introduce multi-scale deep learning-based representation learning methodologies to understand and optimize the downstream tasks in systems biology, such as metabolic pathway inference, missing reaction prediction in GEMs, and retrosynthesis prediction. Specifically, our first study introduces a novel Multi-View Multi-Label learning framework for Metabolic Pathway Inference (MVML-MPI), which outperforms State-Of-The-Art (SOTA) methods by accurately representing the complex relationships between compounds and pathways. In the second study, to address the limitation of incomplete metabolic knowledge in GEMs, we proposed a novel framework named hypergraph Convolution network and attention mechanism integrated Explorer for GAPS prediction of metabolism termed CLOSEgaps. It is a comprehensive deep learning-driven tool that represents the hyper-topological information of GEMs and effectively fills gaps through hyperlink prediction, thereby enhancing the accuracy of phenotypic predictions. In the third study, we proposed a novel end-to-end framework for one-step retrosynthesis that combines the power of a graph encoder, which integrates learnable structural information, with the capability to sequentially translate drugs, thereby efficiently capturing chemically plausible information (RetroCaptioner). This research presents an advancement in systems biology by introducing a suite of multi-scale deep learning methodologies. These methodologies tackle key challenges such as MVML-MPI enhancing our understanding of complex metabolic pathways, CLOSEgaps innovatively filling gaps in metabolic models, and RetroCaptioner facilitating the process of retrosynthesis. Taken together, they form a comprehensive and integrated approach, and our proposed methods significantly advance the capabilities of synthetic biology.

Looking at continual learning through a dynamical system point of view

Friday, April 19, 2024 - 02:15 pm
Online

Krishnan Raghavan

Abstract:
One of the critical features of an intelligent system is to continually execute tasks in a real-world environment. As a new task is revealed, we seek to efficiently adapt to a new task (improve generalization) and, in the process of generalization, we seek to remember the previous tasks (minimize catastrophic forgetting). Consequentially, there are two key challenges that must be modeled: catastrophic forgetting and generalization. Despite promising methodological advancements, there is a lack of a theoretical approach that enable analysis of these challenges.

In this talk, we discuss modelling and analysis of continual learning using tools from differential equation theory. We discuss the broad applicability of our approach and demonstrate the many applications where such an approach is required. We will derive methods in some of these applications using this point of view and show the effectiveness of such approaches in modelling these applications.

Bio:
I am an assistant computational mathematician with the mathematics and computer science division at Argonne national laboratory. I received my Ph.D. in computer engineering from missouri university of science and technology in 2019 and have been at Argonne since then. My primary research agenda is to develop a mathematical characterization of machine learning (ML) models, their learning/training behavior and the associated precision achieved by them. Towards this end, I study the two broad facets of ML: theory; through the eyes of tools from systems theory, statistics and optimization; and applied; by building AI/ML models to solve key problems in nuclear physics, material science, HPC and more recently climate. I enjoy rock climbing, outdoors, cycling, love ramen and many other nerdy things including but not limited to fantasy fiction novels -- go Malazan.

Details here.

Casual analysis & decision intelligence for manufacturing at Bosch

Friday, April 12, 2024 - 02:15 pm
Online

Bosch is a multinational engineering and technology company that develops products in various business sectors, including mobility, industrial technology, energy and building technology, and consumer goods. Currently, Bosch employs over 427K workers and generates ~100B/yr in sales revenue. At the Bosch Center for Artificial Intelligence, in Pittsburgh, we focus on research in the area of neuro-symbolic AI, combining machine learning with knowledge engineering technologies. In this talk, we will illustrate recent efforts in the areas of causal analysis and decision intelligence to improve industrial manufacturing processes. More specifically, we discuss the application of neuro-symbolic methods for (1) root-cause analysis and (2) cognitive architectures for decision making.

About the authors
Alessandro Oltramari is president of the Carnegie Bosch Institute and a senior research scientist at Bosch Center for Artificial Intelligence in Pittsburgh, USA. Oltramari joined Bosch Research in 2016, after working as a research associate at Carnegie Mellon University, funded by public agencies like DARPA, NSF, ARL. At Bosch Research, he focuses on neuro-symbolic AI. His primary interest is to investigate how knowledge-based methods and systems can be integrated with learning algorithms, and help humans and machines make sense of the physical and digital worlds. Contact him at alessandro.oltramari@us.bosch.com

Cory Henson is a lead research scientist at the Bosch Center for Artificial Intelligence in Pittsburgh, USA. His research focuses on knowledge representation and neuro-symbolic AI methods, integrating machine learning with prior domain knowledge. He has led projects to develop and apply this technology for improving autonomous systems, ranging from automated driving to smart manufacturing. More recently, he has become interested in the use of neuro-symbolic methods for representing, learning, and reasoning with causal knowledge. Contact him at cory.henson@us.bosch.com

Details at: https://www.linkedin.com/events/7183461938431983616/about/