2024 Lucy Annual Celebration
Hank Family Forum, 8th Floor
Corbett Family Hall
The poster session features Lucy-affiliated undergraduates, graduate students, and postdoctoral scholars presenting on their exciting interdisciplinary research.
Posters will be on display for the duration of the event. Presentations will occur in two sessions, concurrent with the reception.
Session 1:
5:15 – 6 p.m.
(odd-numbered posters)
Session 2:
6 – 6:45 p.m.
(even-numbered posters)
Presenters represent a variety of Lucy-sponsored programs and labs, including the iTREDS Scholars program, the Lucy Graduate Scholars program, the DIAL Lab, and Lucy Postdoctoral Researchers.
Posters
A summary of each poster title, presenter, and their affiliation can be found directly below. For more detailed information, including poster abstracts, click here.
# | Poster Title | Presenter(s) | Affiliation |
---|---|---|---|
1 | SaludConectaMX mHealth Tool: Improving Functionality to Track Health Inequities in Children with Cancer | Sisy Chen, Anna McCartan, Beatriz Ribeiro Soares, Jane Stallman | iTREDS |
2 | Ways to Better Serve Foster Youth in South Bend | Christian Farls, Lindsay Roney, Kate Schinaman | iTREDS |
3 | Environmental Sensing for a Healthier Community in South Bend | Elizabeth Link, Chris Martinez, Kayra Nugroho, Phyona Schrader | iTREDS |
4 | Works (Un)cited: Investigating the Persistence of Global Exclusion in Peace Studies through Citation Analysis | Julie Hawke | Lucy Graduate Scholar |
5 | Populists in Office: What Explains Their Rhetorical Attacks on Political Parties? | Adriana Pilar Ferreira Albanus | Lucy Graduate Scholar |
6 | Determining the Number of Factors in Exploratory Factor Analysis with Model Error | Yilin Li | Lucy Graduate Scholar |
7 | How Arousal Shapes Our Negative Memories: A Study on Young and Middle-Aged Adults | Seham S. Kafafi | Lucy Graduate Scholar |
8 | Large Language Models for chemical reaction data extraction | Mihir Surve | Lucy Graduate Scholar |
9 | Data Analysis in Mass Spectrometry Proteomics | Simon D. Weaver | Lucy Graduate Scholar |
10 | Analyzing the Impact of Macromolecular Crowding on Protein Aggregation | Isabella Gimon | Lucy Graduate Scholar |
11 | From Molecular Fragments to Novel High-performing Fluids: The Discovery of Green Refrigerants | Barnabas Agbodekhe | Lucy Graduate Scholar |
12 | Strategic Interventions for Urban Carbon Reduction: EcoSphere, A Bottom-Up Simulation Software for Sustainable Cities | Siavash Ghorbany | Lucy Graduate Scholar |
13 | All-Female Teams Produce More Disruptive Work: Evidence from Scientific Papers | Nandini Banerjee | Lucy Graduate Scholar |
14 | The Propagandist’s Global Playbook: Telling China’s Stories Well | Adnan Hoq | Lucy Graduate Scholar |
15 | Dynamic network analysis of protein structural change | Aydin Wells | Lucy Graduate Scholar |
16 | Do Multimodal Large Language Models Understand Welding? | Grigorii Khvatskii | Lucy Graduate Scholar, DIAL Lab |
17 | Analyzing Colombia’s Armed Conflict Using Retrieval-Augmented Generation Approach | Anna Sokol | Lucy Graduate Scholar, DIAL Lab |
18 | HetGPT: Harnessing the Power of Prompt Tuning in Pre-Trained Heterogeneous Graph Neural Networks | Yihong Ma | DIAL Lab |
19 | Rethinking Evaluation in Compound Potency Prediction | Brenda Cruz Nogueira | DIAL Lab |
20 | Intersectional Divergence: Measuring Fairness in Regression | Joe Germino | DIAL Lab |
21 | AnyLoss: Transforming Classification Metrics into Loss Functions | Doheon Han | DIAL Lab |
22 | ChefFusion: Multimodal Foundation Model Integrating Recipe and Food Image Generation | Bruce Huang | DIAL Lab |
23 | Fast Explainability via Feasible Mask Generator | Deng Pan | Lucy Postdoc, DIAL Lab |
24 | Social and economic predictors of under-five stunting in Mexico: a comprehensive approach through the XGB model | Angélica García-Martínez | Lucy Postdoc, DIAL Lab |
25 | Early warning signals of emerging infectious diseases | Qinghua Zhao | Lucy Postdoc |
Poster Abstracts
Click to expand and view each poster’s abstract, as well as other project details.
1 — SaludConectaMX mHealth Tool: Improving Functionality to Track Health Inequities in Children with Cancer
Sisy Chen, Anna McCartan, Beatriz Ribeiro Soares, Jane Stallman
Advisor/Mentor(s): Angélica García-Martínez
Affiliation: iTREDS
Abstract:
One of the significant challenges in Low- and Middle-Income Countries (LMICs) is the fragmentation of health services, which poses a significant barrier to achieving universal healthcare coverage. Our objective is to describe the cultural, social, and economic barriers limiting mobile app use by the caregivers of children with cancer at the HIMFG,
Assess the risk of developing complications due to lack of mobile app attachment, and design new functionalities within the mobile app to improve user engagement.
2 — Ways to Better Serve Foster Youth in South Bend
Christian Farls, Lindsay Roney, Kate Schinaman
Advisor/Mentor(s): Karla Badillo-Urquiola, Sue McDonald
Affiliation: iTREDS
Abstract:
This project is motivated by a need for reliable, accessible information about foster youth in South Bend, which is currently lacking in this community. The goal is to close this gap through data collection with local community members involved in the foster care system.
3 — Environmental Sensing for a Healthier Community in South Bend
Elizabeth Link, Chris Martinez, Kayra Nugroho, Phyona Schrader
Advisor/Mentor(s): Jay Brockman, Sugana Chawla
Affiliation: iTREDS
Abstract:
Environmental factors have been shown to significantly impact health outcomes. This project will utilize data from a network of sensors installed throughout the City of South Bend to analyze and correlate health and education outcomes with environmental data.
4 — Works (Un)cited: Investigating the Persistence of Global Exclusion in Peace Studies through Citation Analysis
Julie Hawke
Advisor/Mentor(s): Caroline Hughes
Affiliation: Lucy Graduate Scholar
Co-author(s): Debora Rogo, Wes Hedden
Abstract:
This research project aims to explore citation patterns within Peace Studies by identifying the most frequently cited authors and their geographical origins. It will examine how citation practices evolve over time, focusing on whether they become more geographically inclusive and if scholars from low income countries are more likely to engage in inclusive citation practices compared to those from the high and middle income countries.
5 — Populists in Office: What Explains Their Rhetorical Attacks on Political Parties?
Adriana Pilar Ferreira Albanus
Advisor/Mentor(s): Aníbal Pérez-Liñán
Affiliation: Lucy Graduate Scholar
Abstract:
Why do populist leaders choose to attack political actors? Scholars have answered similar questions by pointing in two directions: once in office, populists generally attack the establishment on a daily basis to maintain their anti-establishment stance. The alternative view considers the effect of the political system on these leaders’ behavior: in coalition governments, populists will target political actors when part of the opposition but will moderate when part of the ruling coalition. While agreeing with the moderation effect, I argue that, when analyzing attacks on political parties, we are missing the impact of the leader’s sense of threat in party systems as part of the explanation. Therefore, in this paper, I hope to address this issue by analyzing the rhetoric of Jair Bolsonaro and Donald Trump on Twitter before and after becoming president. The first preliminary results of this study show that it is the rise of sense of threat that does a better job of explaining Bolsonaro and Trump’s targeting of parties, while the threats posed to the leaders vary because of the differences in party system among the two countries.
6 — Determining the Number of Factors in Exploratory Factor Analysis with Model Error
Yilin Li
Advisor/Mentor(s): Guangjian Zhang
Affiliation: Lucy Graduate Scholar
Abstract:
This study aims to compare methods to determine number of factors in social sciences when a factor analysis model is not perfect in the population level. We simulated data under various realistic conditions and compared variants of parallel analysis with fit indices such as RMSEA. We made suggestions for applied researchers according to the simulation results.
Project Description
This project is part of a masters thesis. Parallel Analysis (PA) is a popular method to determine the number of factors in exploratory factor analysis. We conduct simulation studies to assess three PA methods in more realistic situations where model error is present. For comparison, we also include an RMSEA-based method (root mean square error of approximation; RMSEA), which is specifically designed to accommodate model error. We illustrate the four methods using two empirical data sets. Our findings include (1) The best-performing PA methods are satisfactory for models with high levels of factor overdetermination (high variable-to-factor ratios), but its performance becomes less satisfactory when the levels of factor overdetermination are low (low variable-to-factor ratios); (2) The RMSEA-based method is more satisfactory than PA methods under most conditions unless the sample size is very small; (3) The performance of the RMSEA-based method improves with larger samples, but the performances of PA methods do not improve with larger samples.
7 — How Arousal Shapes Our Negative Memories: A Study on Young and Middle-Aged Adults
Seham S. Kafafi
Advisor/Mentor(s): Jessica D. Payne
Affiliation: Lucy Graduate Scholar
Co-author(s): Xinran Niu, Mia F. Utayde, Kristin G. Sanders, Tony J. Cunningham, Elizabeth A. Kensinger
Abstract:
This study examined how physiological arousal impacts memory for negative images in young and middle-aged adults. Heart rate deceleration (HRD) was linked to better memory for negative objects, while increased skin conductance (SCR) reduced this effect, but only in middle-aged adults. The findings suggest heart rate may be a stronger predictor of emotional memory changes with age than sweat response.
Project Description
The relationship between physiological arousal and enhanced emotional memory is well established, but how this changes with age is not fully understood. This study explored the impact of physiological arousal on memory for negative and neutral information in young and middle-aged adults. A total of 96 healthy participants from these age groups viewed scenes containing either negative (e.g., a threatening snake) or neutral objects (e.g., a chipmunk) against neutral backgrounds, while their physiological responses were recorded. During a later memory test, participants were asked to identify whether the objects and backgrounds were the same as what they had previously seen. The results showed that heart rate deceleration (HRD) was linked to improved memory for negative objects compared to neutral ones, while a higher skin conductance response (SCR) was associated with a reduced bias toward remembering negative objects. Notably, these effects were only observed in middle-aged adults, not younger ones. This suggests that HRD, a parasympathetic response, may be a more reliable indicator of arousal-related memory enhancement than SCR, a sympathetic response, particularly in middle-aged adults. These findings emphasize the need to consider different physiological responses when examining memory changes with age.
8 — Large Language Models for chemical reaction data extraction
Mihir Surve
Advisor/Mentor(s): Olaf Wiest
Affiliation: Lucy Graduate Scholar
Co-author(s): Bozhao Nan, Gisela A. González-Montiel, Xiaobao Huang, Yuhan Liu, Nitesh V. Chawla, Tangfei Luo, Xiangliang Zhang
Abstract:
Chemical reaction data primarily exists in unstructured form, and its structured storage is extremely important for any downstream application. This work explores the use of LLMs for chemical reaction data extraction from such sources. Different prompting and fine-tuning experiments show that there still needs to be work done for reliable deployment of LLM based workflows.
Project Description
In this study, we explore the potential of large language models (LLMs) for extracting chemical reaction data from unstructured or semi-structured sources. While LLMs have shown promise in natural language processing, their application in the field of chemistry is still in its early stages. Our goal is to investigate how well LLMs can identify and extract key information, such as reactants, products, procedures and yields, from unstructured text. By using curated datasets for chemical reaction extraction, we hope to shed light on the capabilities and limitations of LLMs in this context, which could, in turn, support research in downstream modeling and predictive tasks.
9 — Data Analysis in Mass Spectrometry Proteomics
Simon D. Weaver
Advisor/Mentor(s): Matthew Champion
Affiliation: Lucy Graduate Scholar
Abstract:
We use mass spectrometry proteomics to measure and quantify all the proteins in a biological system. This produces multidimensional datasets which require custom software and data science techniques to analyze. We use this methodology to investigate biological questions with diverse applications including ovarian cancer detection, causes of tuberculosis virulence, and breast cancer drug development.
Project Description
Proteomics is a field of analytical chemistry that uses mass spectrometry to identify and quantify all the proteins in a sample. Proteins perform essentially all the cellular functions required for life, so the ability to measure them simultaneously in a system has far reaching applications in all areas of human health. Proteomics can be used to detect and characterize disease biomarkers for cancer, immune disorders, viral and bacterial infections, metabolic disorders, and many other conditions. We also use proteomics to study fundamental cellular biology, elucidate signaling pathways, and understand how outside stimulus changes the expression of proteins. Mass spectrometry-based proteomics generates massive, multidimensional datasets that require informatics methods to analyze by taking the raw measurements of mass, charge, and other chemical attributes and inferring the identity and quantity of the thousands of proteins present in an original sample. In this poster I present the dimensions of data generated with Mass Spectrometry proteomics and how they contribute to the identification and quantification of proteins in a sample. I also present three biological applications of proteomics: (1) Characterization of an important ovarian cancer biomarker; (2) Investigation of the molecular causes of tuberculosis virulence; and (3) Screening of breast cancer drug candidates for their impact on the proteome.
10 — Analyzing the Impact of Macromolecular Crowding on Protein Aggregation
Isabella Gimon
Advisor/Mentor(s): Santiago Schnell
Affiliation: Lucy Graduate Scholar
Co-author(s): Conner Sandefur
Abstract:
Protein aggregation has been correlated with many neurodegenerative diseases such as Huntington’s disease and Alzheimer’s.The outcome of protein aggregation is extremely deleterious for any organism since it prevents a protein from acquiring its functional state. Using rule-based models to study protein aggregation, Isabella is investigating the processes that leads to aggregation.
11 — From Molecular Fragments to Novel High-performing Fluids: The Discovery of Green Refrigerants
Barnabas Agbodekhe
Advisor/Mentor(s): Edward J. Maginn
Affiliation: Lucy Graduate Scholar
Co-author(s): Dinis Abranches, Montana Carlozo, Kyla Jones, Alexander Dowling
Abstract:
This work presents a novel and rigorous approach to the discovery of new materials applied to refrigerant discovery. We developed a computational molecule generation tool written in Python called FineSMILES and applied molecular modeling and machine learning strategies to predict thermophysical and evironmental properties for potential refrigerant screening.
Project Description
The Kigali Agreement of 2016 requires that environmentally harmful hydrofluorocarbons (HFCs), which are predominantly used as refrigerants but cause global warming, be phased out. The phase-out of current refrigerant compounds requires the design of green alternatives. Previous works in green refrigerant discovery have relied mainly on database screening in which it was reported that options for pure-component green refrigerants were very limited. This work presents a novel and rigorous approach to the discovery of new materials applied to refrigerant discovery. We developed a computational molecule generation tool written in Python called FineSMILES. FineSMILES collects packets of molecular fragments made up of pre-selected elements based on an understanding of the problem and literature. These packets of molecular fragments are screened using structural and chemical constraints to identify packets that could be assembled into chemically feasible molecules. FineSMILES assembles these packets into complete molecules with outputs as SMILES strings. Application of the FineSMILES code generated hundreds of thousands of molecules, of which more than fifty percent are new molecules that are not present in the PubChem database of compounds used in previous refrigerant discovery projects. A key implication is that a new world of chemical compounds not previously explored for use as green refrigerants has been discovered. We then applied a simple and novel approach that integrates group contribution models and Sigma profiles with Gaussian process regression models for accurate and reliable property prediction. The predicted properties were then used to screen the refrigerant molecules for technical, environmental, and safety performance. Based on a combined consideration of predicted thermophysical, environmental, and safety properties, tens of molecules were identified as high-potential green refrigerants, with several of them not previously reported in the open literature. Future work includes further in-depth screening of the identified molecules using molecular simulations and quantum mechanical calculations.
12 — Strategic Interventions for Urban Carbon Reduction: EcoSphere, A Bottom-Up Simulation Software for Sustainable Cities
Siavash Ghorbany
Advisor/Mentor(s): Ming Hu
Affiliation: Lucy Graduate Scholar
Abstract:
The built environment, including all the man-made structures in the urban and rural areas, is responsible for about 40% of greenhouse gas emissions globally. Emissions that are not only leading to climate changes and extreme weather but also impact the urban residents’ health status directly. This study introduces a framework and software for creating a high-resolution building-by-building dataset for cities in the United States and reports different decisions that city policymakers can follow and the consequences of these choices in a user-friendly dashboard.
Project Description
The construction industry accounts for approximately 40% of global greenhouse gas emissions, making it a critical sector for addressing carbon emissions. Urban areas are particularly significant contributors. However, tackling this issue has been challenging due to a lack of comprehensive data and methodologies for assessing the building sector’s impact on such a large scale. This study aims to develop a methodology for collecting data and simulating embodied carbon emissions across the entire lifecycle of buildings at an urban scale. It demonstrates the effects of various scenarios on embodied carbon emissions, including changes in building lifespans, renovation and replacement strategies, area per building, and new construction volumes. Using a bottom-up archetype approach, the study models cities and evaluates the impact of six mitigation strategies on urban-scale carbon emissions. Additionally, it develops standalone software to simulate, assess, and predict the embodied carbon emission in these scenarios and their economic impacts at the national level in the United States. This dashboard not only provides insight into different scenarios of environmental impact but also estimates the construction and urban development costs associated with each of these decisions. As a pilot, this approach was applied to Chicago, demonstrating potential reductions in embodied carbon emissions by 65 to 80 percent through strategic interventions. The findings underscore the profound influence of urban planning decisions on city decarbonization and offer valuable software for policymakers and researchers aiming to evaluate and implement effective carbon reduction strategies across American cities.
13 — All-Female Teams Produce More Disruptive Work: Evidence from Scientific Papers
Nandini Banerjee
Advisor/Mentor(s): Diego Goméz-Zará
Affiliation: Lucy Graduate Scholar
Abstract:
Scientific works are called disruptive when they move away from the existing knowledge status quo and create a new direction of research. Having smaller, younger, egalitarian scientific teams collaborating in the same location have been shown to produce more disruptive work. Thus, we aim to study the impact of that a team’s gender composition could have on it work’s disruptiveness.
Project Description
Scientific teams are increasingly disrupting science by providing discoveries and breakthroughs. Disruptive teams reshape established scientific paradigms and forge new ones, eclipsing established theories, methods, and research directions. They create new research avenues, scientific paradigms, technologies, and products by making their predecessors’ ideas outdated. Timely recognition of disruptive teams would be extremely advantageous as it would aid researchers in exploring undiscovered research paths, funding agencies in selecting promising projects, and investors in choosing initiatives to finance. Previous research has analyzed the properties of scientific teams producing disruptive papers and patents to discover what factors and characteristics are associated with disruptive teams. Despite these findings, how scientific teams’ gender composition influences disruption remains uninvestigated. Therefore, we ask whether the number of female scientists in a team could benefit teams to be more disruptive.
14 — The Propagandist’s Global Playbook: Telling China’s Stories Well
Adnan Hoq
Advisor/Mentor(s): Tim Weninger
Affiliation: Lucy Graduate Scholar
Co-author(s): Karrie Koesel, Peitong Jing
Abstract:
We analyzed PRC propaganda efforts over a 12-month period using computational social science, focusing on state-run media content from Facebook and X to uncover propaganda content and their strategies.
Project Description
In our study, we perform a comprehensive analysis of Chinese propaganda on social media, focusing on both visual and textual content. To capture this multimodal approach, we collected 36,188 posts from Chinese state-run media, embassies, consulates, and diplomats on X/Twitter and Facebook over a 12-month period. We employed the Contrastive Language-Image Pretraining (CLIP) model, a sophisticated AI system capable of co-analyzing text and images, to identify and assess propaganda strategies aimed at shaping global opinions. The decision to examine both visual and textual propaganda together was driven by three key reasons. Firstly, visuals complement textual narratives and have a strong persuasive power due to their emotional impact and ease of sharing across language barriers. Secondly, multimedia platforms like X/Twitter and Facebook naturally integrate images and videos with text to boost engagement. Lastly, visual content is becoming increasingly dominant in global media trends, surpassing text in influence. This multimodal approach allowed us to document the coordinated efforts of China’s propaganda strategy, revealing a blend of positive messaging, negative portrayals of democratic competitors, and defensive tactics to distract from criticism.
15 — Dynamic network analysis of protein structural change
Aydin Wells
Advisor/Mentor(s): Tijana Milenkovic
Affiliation: Lucy Graduate Scholar
Co-author(s): Siyu Yang, Khalique Newaz
Abstract:
Proteins fold into 3D structures that dictate their interactions and functions, making structural analysis crucial for understanding the proteins’ roles. Our lab previously developed network-based methods to model protein structures as protein structure networks (PSNs), showing superior performance over traditional methods in protein structure classification (PSC). To improve on this, we recently proposed dynamic PSNs and are now exploring how these models can predict structural changes in proteins when bound to ligands (i.e. protein motion).
Project Description
A protein’s sequence folds into a 3D structure, which directs what other proteins it may interact with to carry out cellular function. Hence, analyses of protein structures are critical for understanding protein functions. Because functions of many proteins remain unknown, computational approaches for linking proteins’ structures to functions are necessary. Our lab previously used network-based methods to model protein structures as protein structure networks (PSNs). Graph-based analyses of these PSNs proved to be superior to using state-of-the-art sequence and non-network-based 3D structural approaches in task of protein structure classification (PSC). However, traditional PSN approaches (including ours) modeled whole, native protein 3D structures as static PSNs that overlook the protein folding dynamics. To overcome this, we recently proposed a dynamic PSN idea, and more recently, as a better proxy to studying protein folding dynamics, we have identified large enough experimental data that captures how the structure of a protein dynamically changes before vs. after the protein is bound to a ligand. We aim to examine how well the dynamic PSN analyses of this data will be able to explain seven different types of protein structural changes observed in the data.
16 — Do Multimodal Large Language Models Understand Welding?
Grigorii Khvatskii
Advisor/Mentor(s): Nitesh Chawla
Affiliation: Lucy Graduate Scholar, DIAL Lab
Co-author(s): Yong Suk Lee, Corey Angst, Nicholas Berente, Maria Gibbs, Robert Landers
Abstract:
In this project we have examined the performance of Multimodal LLMs in skilled production work, specifically in welding. We have collected a novel dataset of welding joint images, that were annotated by a domain expert. We have also developed a novel prompt strategy allowing the model to use existing knowledge to improve classification of new images. Our findings reveal the MLLMs struggle in new contexts, suggesting the need for further research into industrial MLLMs.
Project Description
This work examines the performance of Multimodal LLMs (MLLMs) in skilled production work, with a focus on welding. Using a novel data set of real-world and online weld images, annotated by a domain expert, we evaluate the performance of two MLLMs, GPT-4o and LLaVA-1.6, in assessing weld acceptability across three contexts: RV & Marine, Aeronautical, and Farming. Additionally, we introduce WeldPrompt, a prompting strategy that combines Chain-of-Thought generation with Retrieval-Augmented Generation to mitigate hallucinations and improve reasoning. Our findings reveal that both models struggle to generalize to unseen real-world weld images, showing better performance on online images, likely due to memorization rather than reasoning. WeldPrompt improves model recall in certain contexts but exhibits inconsistent performance across others. This study opens avenues for further research into multimodal learning in industry applications. MLLMs show both strengths and weaknesses in industrial settings, showing the need for further research due to inconsistent strictness compared to domain experts. WeldPrompt results and lower performance on Real World data indicate the need for improving MLLM reasoning in unfamiliar domains. Additionally, this suggests the susceptibility of simple RAG to class imbalance. Larger or more complex models don’t always perform better, highlighting the need to explore task- and context-specific fine-tuned models as well as more advanced RAG-based inference for industrial use.
17 — Analyzing Colombia’s Armed Conflict Using Retrieval-Augmented Generation Approach
Anna Sokol
Advisor/Mentor(s): Nitesh Chawla, Matthew Sisk
Affiliation: Lucy Graduate Scholar, DIAL Lab
Abstract:
This project utilizes an Advanced Retrieval-Augmented Generation (RAG) Pipeline powered by GPT and LLAMA models to analyze the extensive data from Colombia’s armed conflict. By processing information from books and interviews, we aim to uncover patterns and insights that contribute to a deeper understanding of the conflict’s impact and support efforts towards reconciliation.
Project Description
This project investigates the use of an Advanced Retrieval-Augmented Generation leveraging GPT-4 and LLAMA, to perform comprehensive analysis of textual data concerning Colombia’s prolonged internal armed conflict. By aggregating and processing information from the Truth Commission’s final report, various books, and numerous interviews, the system synthesizes insights from nearly six decades of historical data. The prototype integrates advanced methodologies including document indexing, semantic embeddings, and sophisticated query rewriting to enhance data retrieval and interpretative accuracy. The objective is to develop a robust AI-driven tool that facilitates deeper understanding of complex social issues by efficiently handling large-scale unstructured data. This approach not only contributes to the academic study of conflict dynamics but also supports reconciliation efforts by uncovering patterns and trends that may inform policy decisions.
18 — HetGPT: Harnessing the Power of Prompt Tuning in Pre-Trained Heterogeneous Graph Neural Networks
Yihong Ma
Advisor/Mentor(s): Nitesh Chawla
Affiliation: DIAL Lab
Abstract:
We propose HetGPT, a general post-training prompting framework tailored for heterogeneous graphs, which is the first attempt to adapt the “pre-train, prompt” paradigm from homogeneous graphs to heterogeneous graphs. Extensive experiments on three benchmark datasets demonstrate HetGPT’s capability to enhance the performance of SOTA HGNNs on semi-supervised node classification.
Project Description
Graphs have emerged as a natural choice to represent and analyze the intricate patterns and rich information of the Web, enabling applications such as online page classification and social recommendation. The prevailing ”pre-train, fine-tune” paradigm has been widely adopted in graph machine learning tasks, particularly in scenarios with limited labeled nodes. However, this approach often exhibits a misalignment between the training objectives of pretext tasks and those of downstream tasks. This gap can result in the ”negative transfer” problem, wherein the knowledge gained from pre-training adversely affects performance in the downstream tasks. The surge in prompt-based learning within Natural Language Processing (NLP) suggests the potential of adapting a ”pre-train, prompt” paradigm to graphs as an alternative. However, existing graph prompting techniques are tailored to homogeneous graphs, neglecting the inherent heterogeneity of Web graphs. To bridge this gap, we propose HetGPT, a general post-training prompting framework to improve the predictive performance of pre-trained heterogeneous graph neural networks (HGNNs). The key is the design of a novel prompting function that integrates a virtual class prompt and a heterogeneous feature prompt, with the aim to reformulate downstream tasks to mirror pretext tasks. Moreover, HetGPT introduces a multi-view neighborhood aggregation mechanism, capturing the complex neighborhood structure in heterogeneous graphs. Extensive experiments on three benchmark datasets demonstrate HetGPT’s capability to enhance the performance of state-of-the-art HGNNs on semi-supervised node classification.
19 — Rethinking Evaluation in Compound Potency Prediction
Brenda Cruz Nogueira
Advisor/Mentor(s): Nuno Moniz, Nitesh Chawla
Affiliation: DIAL Lab
Abstract:
This study investigates the impact of a non-uniform domain preference metric on improving algorithm performance in predicting compound potency, particularly in high-relevance ranges. By comparing this approach with traditional metrics across ten potency classes, our findings show that this method, which accounts for operating ranges, not only enhances predictions in critical cases but also uncovers unique high-potency compounds missed by conventional methods. The study underscores the need to reassess current evaluation and optimization practices in compound potency prediction, with broader implications for tasks beyond chemistry where non-uniform domain preferences are important.
Project Description
Regression tasks are indispensable in many fields, including chemistry where accurate potency predictions are critical for drug discovery. High potency values are preferred in this context, as they indicate effective substances that achieve their intended biological effect. However, conventional evaluation metrics and loss functions used to train prediction models for these tasks prioritize average performance, assuming all values in a domain are equally relevant. This study argues for the need to reassess current evaluation and optimization practices in compound potency prediction, entailing urgent implications for a vast range of tasks beyond the chemistry domain where non-uniform domain preferences are observed. Specifically, we use data on ten potency classes to compare the outcomes of models selected and optimized using traditional loss functions and a recent proposal metric that accounts for non-uniform domain preferences in regression tasks. Our empirical results show that this new metric can significantly enhance algorithm performance, not only in the most relevant cases but also at the boundaries between less and more relevant instances. Some algorithms could only predict truly high-relevant compounds using this metric in certain classes, with one specific class where only one model utilizing this metric accurately predicted high-potency compounds. Additionally, algorithms using this metric generally identified relevant compounds that were detected by other models, while also uncovering unique compounds not identified by traditional metrics. Overall, this approach led to the identification of a greater number of both unique and total relevant compounds. Critically, our study underscores the importance of using metrics considering non-uniform domain preferences and leveraging methods accounting for specific operating ranges, alongside traditional metrics.
20 — Intersectional Divergence: Measuring Fairness in Regression
Joe Germino
Advisor/Mentor(s): Nitesh Chawla, Nuno Moniz
Affiliation: DIAL Lab
Abstract:
Existing fairness measures are insufficient because they focus on a single protected attribute and fail to account for the intersectionality of individuals. Additionally, while extensive work has been done on fairness in classification tasks, negligible consideration has been given to fairness in regression. We propose Intersectional Divergence (ID) as the first measure of fairness in regression problems that allows for understanding fair model behavior across multiple protected attributes while differentiating the impact of predictions in target ranges most relevant to users.
21 — AnyLoss: Transforming Classification Metrics into Loss Functions
Doheon Han
Advisor/Mentor(s): Nitesh Chawla
Affiliation: DIAL Lab
Co-author(s): Nuno Moniz
Abstract:
New method to generate a loss function in a neural network that can directly target a selected confusion matrix-based evaluation metric.
22 — ChefFusion: Multimodal Foundation Model Integrating Recipe and Food Image Generation
Bruce Huang
Advisor/Mentor(s): Nitesh Chawla
Affiliation: DIAL Lab
Co-author(s): Peiyu Li, Yijun Tian
Abstract:
Significant work has been conducted in the domain of food computing, yet these studies typically focus on single tasks such as t2t (instruction generation from food titles and ingredients), i2t (recipe generation from food images), or t2i (food image generation from recipes). None of these approaches integrate all modalities simultaneously. To address this gap, we introduce a novel food computing foundation model that achieves true multimodality, encompassing tasks such as t2t, t2i, i2t, it2t, and t2ti. By leveraging large language models (LLMs) and pre-trained image encoder and decoder models, our model can perform a diverse array of food computing-related tasks, including food understanding, food recognition, recipe generation, and food image generation. Compared to previous models, our foundation model demonstrates a significantly broader range of capabilities and exhibits superior performance, particularly in food image generation and recipe generation tasks.
Project Description
Given the fundamental role of food in human life, the field of food computing has recently attracted considerable academic interest. This growing area of research has led to numerous studies, each typically focusing on a specific task. For instance, some works focus on generating instructions from food titles and ingredients, as well as generating ingredients from recipe titles and cooking instructions, which fall under text-to-text (t2t) tasks. Other studies concentrate on generating recipes based on food images, which belong to image-to-text (i2t) tasks. Additionally, some research contributes to generating food images from recipes, categorized as text-to-image (t2i) tasks. Despite these advancements, no approach has yet combined all these modalities into an integrated system, highlighting a significant gap. Moreover, recent developments in Transformer-based large language models (LLMs) and diffusion models have shown exceptional performance in various vision and language tasks. However, current methods in food computing have not kept pace with these state-of-the-art (SotA) techniques in natural language processing (NLP) and computer vision (CV). To address this gap, we present ChefFusion, a novel food computing foundation model that achieves true multimodality, encompassing tasks such as t2t, t2i, i2t, it2t, and t2ti. ChefFusion integrates these SotA models by employing a pretrained Transformer-based LLM for processing and generating recipes, a visual encoder for extracting image features, and an image generation model for generating food images. This integration enables ChefFusion to perform a diverse array of food computing-related tasks, including food understanding, food recognition, recipe generation, and food image generation.
23 — Fast Explainability via Feasible Mask Generator
Deng Pan
Advisor/Mentor(s): Nuno Moniz, Nitesh Chawla
Affiliation: DIAL Lab, Lucy Postdoc
Abstract:
Balancing general applicability and fast inference speed is a long-standing dilemma prevents the broader application of explanation methods. In this study, we aim to bridge the gap between the universality of model-agnostic explanations and the efficiency of model-specific explanations.
24 — Social and economic predictors of under-five stunting in Mexico: a comprehensive approach through the XGB model
Angélica García-Martínez
Advisor/Mentor(s): Nitesh V. Chawla
Affiliation: DIAL Lab, Lucy Postdoc
Co-author(s): Brian Fogarty, Edson Serván-Mori
Abstract:
Childhood stunting in low—and middle-income countries, including Mexico, is driven by a complex interplay of genetic, environmental, and socioeconomic factors that affect children’s well-being and future potential. A Machine Learning approach applied to data from 2006 to 2018 identified socioeconomic status, state of residence, child’s age, Indigenous status, and local deprivation as key predictors of stunting in Mexico. These findings highlight the need for targeted and sustainable interventions, especially in the face of reduced health programs and rising poverty.
Project Description
The project aims to analyze the social and economic determinants of childhood stunting in Mexico from 2006 to 2018, using Machine Learning (ML) techniques to identify the most significant predictors of stunting. By leveraging data from the Mexican National Health and Nutrition Surveys (ENSANUTs), six ML classification algorithms were tested to model stunting risk, with Extreme Gradient Boosting (XGB) proving to be the most effective. The project identified key predictors, including household socioeconomic status, state of residence, child’s age, indigenous status, and local deprivation. The results provide critical insights for designing targeted, data-driven interventions to combat childhood stunting in Mexico, especially in light of reduced health programs and increasing poverty.
25 — Early warning signals of emerging infectious diseases
Qinghua Zhao
Advisor/Mentor(s): Jason Rohr
Affiliation: Lucy Postdoc
Abstract:
Predicting infectious disease outbreaks weeks before they occur so that preventative and mitigating actions could be implemented could save millions of lives, avert massive disease and discomfort worldwide, and improve quality of life globally. Here, we utlize the early warning signals (a model-free statistics that can indicate the upcoming of diseases outbreaks), and we analyse 31 types of human diseases worldwide. We found that we can detect disease pandemics as early as two weeks in advance, while the patterns are pathogen traits dependent.
Project Description