Lucy Family Institute for Data and Society Fall Symposium

Tuesday, Oct 11 12-3 at McKenna Hall | Awards 5-7 at Foley’s O’Neill Hall 

Student posters opened the symposium during the lunch hour

Symposium Participants voted online for posters in two categories:

TREDS Capstone Projects


Congratulations to :

Unlocking Social Determinants of Health & Wellbeing for Equitable Health & Healthcare Access in LMIC Wagner M, Calderon A, Robles J
A Sustainability and Food Security Analysis of International Agricultural Efforts McKenna J, Sabumukama E, Cha T, Nordan E

Research Collaborations 


Congratulations to:

A Novel Method for Efficient Uncertainty Reduction in Estimates of Air Pollution Mortality

Alifa M, Castruccio S, Bolster D, Bravo M, Crippa P

Active learning evaluation, criteria analyses, and application recommendation on metal-organic frameworks Osaro E, Colón Y


Submitted Entries:

Title Author (s) Abstract
A Harm Reduction Approach: Application to Read Fentanyl Test Strips Chen L, Killian T, MacLachlan S People Who Use Drugs (PWUD) and harm reduction groups in America are in urgent need of a portable, inexpensive, accurate, and confidential way to proactively prevent fentanyl overdose. Fentanyl is a highly lethal, potent, and inexpensive chemical that has often been mixed in street drugs to increase the “high” while lowering the cost. FTS Reader App is a mobile app designed to help people who use drugs (PWUD) better detect the presence of fentanyl in their substance.
A Novel Method for Efficient Uncertainty Reduction in Estimates of Air Pollution Mortality Alifa M, Castruccio S, Bolster D, Bravo M, Crippa P Implementing effective policy to protect human health from the adverse effects of air pollution, such as premature mortality, requires reducing the uncertainty in health outcomes models. We present a novel method to reduce mortality uncertainty by increasing the amount of input data of air pollution and health outcomes, and then quantifying tradeoffs associated with the different data gained. We apply this method to a real case scenario for short-term mortality from fine particulate matter (PM2.5) in Phoenix, Arizona, employing a commonly used epidemiology model that combines information on air pollution data with aggregated daily mortality data. We fit our epidemiological model several times with varying amounts of health and/or pollution data and employ information yield curves to identify which variables more effectively reduce mortality uncertainty when increasing information. Both pollution and epidemiology data tend to be scarcer for communities in rural areas and/or of low socio-economic status, giving this novel method interesting environmental justice implications that will be further explored in future work. Applying this framework to any real case-scenario where knowledge in pollution, demographics, or health outcomes can be augmented through data acquisition or model improvements can generate more robust risk assessments as well as a better awareness of socio-demographic inequalities in the distribution of information.
A Sustainability and Food Security Analysis of International Agricultural Efforts McKenna J, Sabumukama E, Cha T, Nordan E Using web-scraping and text analysis, this project will assess whether recent international efforts to address agricultural issues across the globe address issues of climate change and food insecurity. We are interested in understanding the goals and frameworks of these projects and in particular how they engage with these issues. Our data comes from projects approved since 2015 by the Big Seven agencies working in agriculture: the World Bank, African Development Bank, International Fund for Agricultural Development, the Global Agricultural Food Security Program, World Food Program, Food and Agriculture Organization and the Consultative Group for International Agricultural Research. The goal of this project is to critically assess progress in terms of sustainability and inclusion with regards to agriculture.
Active learning evaluation, criteria analyses, and application recommendation on metal-organic frameworks Osaro E, Colón Y Proper selection of metal-organic frameworks (MOFs) has been dependent on the application usage and performance of the MOFs of interest, thus techniques such as high throughput molecular simulations and machine learning have been implemented to adequately screen these large number of MOFs. While these molecular simulations such as the Grand Canonical Monte Carlo (GCMC) method have proven effective in calculating the adsorption capacity at several conditions such as pressures and temperature, they demand some expensive computational resources and on the other hand, the machine learning (ML) models require large datasets, thus there exists a need for algorithms that can adequately explore these feature space and cut down on the total number of simulations required for the adsorption prediction from the ML models. In this work, we study the active learning (also known as sequential design) framework on four molecules of research and industry relevance across 12 MOFs of diverse surface areas. We make use of Gaussian Process Regression (GPR) to model nitrogen at 77K from 10-5 to 1 bar, methane at 298K from 10-5 to 100 bar, carbon dioxide at 298K from 10-5 to 100 bar, and hydrogen at 77K from 10-5 to 100 bar on Cu-BTC, PCN-61, MgMOF-74, DUT-32, DUT-49, MOF-177, NU-800, UIO-66, ZIF-8, IRMOF-1, IRMOF-10 and IRMOF-16. The GPR model requires an initial training of the model with an initial data set commonly referred to as prior, and in this study of evaluating active learning (AL), we make use of three different prior selection schemes and each prior scheme gets updated with a sampling point resulting from the GP model uncertainties. This protocol continues till a maximum GPR relative error of 2% is attained, and we make a recommendation on the best prior selection scheme for the total forty-eight adsorbate-adsorbent pairs primarily making use of the R. squared metric and secondarily, the total amount of points required for convergence of the model to the set policy. To further evaluate the AL framework, we apply the BET consistency criteria on the simulated and GP nitrogen isotherms and compare the resulting surface areas.
AI Support for UX Design and Personal Interfaces Lu Y, Tong Z, Li T Computational Machine Learning models for user interfaces (UIs) are becoming incresingly successful. State-of-the-art models in adjacent areas have also achieved large-scaled success (e.g. DALL-E 2, Stable Diffusion, Midjourney). At the same time, dark patterns are undermining users’ benefits by exploiting UI designers’ dictating power over interfaces. In this poster, we investigate the potential of using machine learning for supporting UX designers’ work in a human-centric manner and empowering end users against design dark patterns’ potential harm.
An Empirical Study of Model Errors and User Error Discovery and Repair Strategies in Natural Language Database Queries Zhang Z Recent progress in machine learning (ML) and natural language processing (NLP) enabled the translation of natural language queries into structured database query languages like SQL (NL2SQL). However, despite the significant improvement in model performance
(∼75% accuracy in popular datasets) from the ML/NLP communities in past years, the HCI challenges in user error discovery and repair limit the wide adoption of natural language data queries. In this paper, we (1) performed a comprehensive analysis of errors made by three popular state-of-the-art NL2SQL models while writing the paper, introducing a taxonomy of NL2SQL model errors and the corresponding descriptive statistics; (2) conducted a controlled user study with 26 participants with varied expertise levels to
investigate the effectiveness and efficiency of three representative interactive NL2SQL error handling techniques on different types of queries and errors. Findings from this paper shed light on the design of future error discovery and repair strategies for natural language data queries.
Archdiocese of Chicago Enrollment Project Fava-Pastilha M, Delaney M, Volling N The Archdiocese of Chicago has seen a staggering decrease in enrollment in Catholic schools in the past generation. The once thriving 250,000 student population is now at a mere 40,000, and only seems to be continuing to decrease. The goal of our project is to utilize the data and context provided to us to give the archdiocese insight into their decreasing enrollment, along with potential solutions or suggestions. We are working with Sean Murdock, a project manager at the archdiocese. This project is still in its beginning stages, and right now we are looking at the context of the schools and parishes that have closed or that are struggling.
Barriers to Healthcare Access for Children with Cancer: A Mixture-Approach-Based Risk Predictive Model García-Martínez A, Serán-Morín E, Orozco E, Chawla N. Health Systems are fragmented in Low- and Middle- Income Countries generating substantial inequalities in access to quality health services. The absence of Universal Health Coverage increases the vulnerabilities of children with cancer. Using the Mexican case of the National Institute of Pediatric “ Federico Gómez” a qualitative study was conducted in 2021 to investigate the social, economic, and cultural determinants in health access of children with cancer. This information will be integrated into a Predictive Model to approach the risk to develop fever and neutropenia, chemotherapy complications associated with septic shock, and death.
Co-Creating Spaces with Local Non-Profits to Expand Civic Data Analysis and Visualization Capacities Connors G, Salamone R, Sweis M As we have come to live in a datafied world, nonprofits now face unique issues when it comes to data literacy, including a lack of financial resources, staffing limitations, cultural barriers, and capacity limitations. Therefore, for our iTREDS capstone project, we aim to develop a curriculum to improve data literacy for local South Bend nonprofit organizations. This curriculum will be based on the Carpentries pedagogical model, which provides an ethical, two-way style of teaching and learning and which stresses adaptability and flexibility. We will tailor the curriculum for each nonprofit to be responsive to the current skill levels of nonprofit employees and the specific context in which their organization operates. The curriculum will then ultimately be delivered through workshops held within the South Bend community during the spring semester. We will be distributing surveys aligned with the Carpentries teaching framework in order to assess the efficacy of the workshops. While the overall outcome of this capstone project will be an evaluation of the curriculum’s success in South Bend, we plan to create materials that are freely available and easily adaptable to other nonprofits and communities.
Co-Designing with Foster Youth Online Safety Interventions Oguine O C, Wan R, Badillo-Urquiola K Teens in the U.S. foster care system are some of the most vulnerable youth to offline risks, like sex trafficking. Research shows that vulnerable offline teens are often most susceptible to online risks. Recent interviews with caseworkers and foster parents revealed that many of the high-risk offline experiences foster youth encounter are mediated by technology. Unfortunately, much research has not been conducted to understand foster youth’s technology use, nor how effective interventions should be designed to empower foster youth against becoming victims of online risks. Therefore, we propose a participatory design study with foster youth that employs an Adolescent Resilience Framework to understand the types of safety interventions they believe would be most effective in helping protect them from online risks. We expect to receive feedback on this proposed study through our interactive poster. Ultimately, partnering with foster youth on the design of safety interventions will empower them to take more control in managing their online experiences by designing and evaluating real-world solutions that directly benefit them.
Combined Machine Learning and Chemometrics of NIR spectra can quantify acetaminophen, enabling detection of substandard pharmaceuticals Awotunde O, Roseboom N, Cai J, Hayes K, Rajane R, Chen R, Yusuf A, Lieberman R Advanced sensing technologies and chemometrics are central to improving identification of substandard and falsified pharmaceuticals in field settings. Vibrational spectroscopic techniques such as near infra-red (NIR) assess the vibrational energies of molecules in pharmaceuticals with prompt, precise, and non-destructive characteristics. Mathematical and statistical exploration of the spectra from these technologies provides characteristics information to distinguish fake pharmaceuticals from genuine ones. However, it is difficult to build comprehensive product libraries in field settings due to the large numbers of manufacturers who supply these markets, frequent unreported changes in materials sourcing and product formulation by the manufacturers, and general lack of cooperation in providing authentic samples. In this work, we demonstrate that a simple library of lab-formulated binary mixtures (an active pharmaceutical ingredient (API) and two diluents) gave good analytical predictions on branded acetaminophen drugs by discriminating substandard and falsified formulations of the API. Six chemometric and machine learning models that individually showed poor robustness for formulations outside the training set were combined for an optimized performance that integrates the respective unique strengths. Our end goal is to integrate NIR with the chemical functional group analysis performed by our already widely accepted paper analytical device; together, these technologies will be a more powerful tool for field screening of pharmaceutical and illicit drugs.
Differentially Private Outcome Weighted Learning Giddens S The intelligent use of data for statistical analysis and machine learning has proven potential to benefit society. However, many valuable datasets are sensitive, containing private information that can cause irreparable harm if inappropriately revealed. When this type of data is involved, the goal of statistical analysis and machine learning methods should be to learn useful information about collective groups without compromising private information about individual people. Ad hoc anonymization methods, such as removing names and other directly identifying information from datasets before using them, have been shown to be ineffective at preserving privacy in the presence of modern computational tools and relevant publicly available datasets. In this context, differential privacy (DP) has emerged as the state-of-the-art framework for mathematically precise measurement and bounding of privacy loss when performing analyses or training machine learning models. Privacy-preserving versions of some popular machine learning algorithms have been proven and analyzed, but there are still many that have yet to be explored. Individualized treatment rules (ITR) are a class of algorithms that falls into the unexplored category. These algorithms perform causal inference, attempting to learn and predict cause-effect relationships. Outcome weighted learning (OWL) is an ITR algorithm commonly used to assign optimal treatments based on clinical trial data. This poster presents my research on the application of DP to OWL, including some recent simulation results from an algorithm approximating OWL for which I proved DP guarantees.
Efficiency Research in Sociology and Social Sciences: A Bibliometric Study Khvatskii G, Zaytsev D, Kuskova V This work presents a semiautomatic approach to reviewing literature related to a particular field of study. As an example application, this study attempts to map out the current state of effectiveness research in social sciences and sociology. Another essential part of this work is tracing the field’s historical development. We based our study on the methodology of bibliometric analysis first proposed by Maltseva and Batagelj. We decided to use the Web of Science bibliographic database as the data source for this work. In total, we have analyzed more than 200,000 bibliographic records of articles in the fields of Political Science, Public Administration, Sociology, Psychology, Economics, and Management. We present an overview of the development of efficiency research in sociology over time, as well as its current state.
Empower Gig Workers Against AI Inequality With AI Agents Lu Y, Chen M, Cox V, Bsales M, Clark J, Jiang M, Yang Y, Kay T, Wood D, Brockman J, Li T The growing inequality in gig work between workers and platforms has become a critical social issue as gig work plays an increasingly prominent role in the future of work. The AI inequality is caused by (1) the technology divide in who has access to AI technologies in gig work; and (2) the data divide in who owns the data in gig work leads to unfair working conditions, growing pay gap, neglect of workers’ diverse preferences, and workers’ lack of trust in the platforms. In this poster, we argue that a bottom-up approach that empowers individual workers to access AI-enabled work planning support and share data among a group of workers through a network of end-user-programmable intelligent assistants is a practical way to bridge AI inequality in gig work under the current paradigm of privately owned platforms. We introduce our efforts in data collection and analysis, and human-centered design to help individual workers against AI inequality with personalized AI agents.
Estimating Structural Equation Models with Neural Networks Tong L, Zhang J, Jiang M, Li J Structural Equation Modeling (SEM) is a set of statistical techniques for analyzing the relationships between latent and observed variables. In recent years, a growing interest has been shown in nonlinear structural equation models (Brandt et al., 2014; Jin et al., 2021). However, existing approaches were designed only to estimate models with quadratic terms, while nonlinear effects might be more complex (e.g., exponential or logarithmic effects), and generally unknown. To bridge the gaps, we propose to leverage neural networks to estimate structural equation models containing arbitrary nonlinear effects. Specifically, we focus on detecting the potential relationship between the independent and dependent variables when there exists a nonlinear relationship between the latent dependent variables and their indicators. We show how the method works through a MIMIC model. The results confirmed our hypothesis that a basic neural network was compatible with a structural equation model and could be used to estimate the latter even when nonlinear relationships exist.
Exploring the Efficacy of Small School Teacher Leadership Using Social Network Analysis Olshefke A, Trinter C, Young J, Benas R, Jegier J, Selover N School leadership models that incorporate mathematics teacher leaders (specialists/coaches) have the potential to promote collaboration which strengthens professional learning and contributes to initiating and sustaining instructional improvement. However, beneficial outcomes rely on ensuring the contextual relevancy of these leadership models, and the majority of research on teacher leadership focuses on larger school systems with full-time teacher leaders trained through Master’s degree or licensure programs. This project presents an alternative teacher leadership development and model that reimagines the traditional teacher leadership structure to accommodate the disparate needs of small school systems. Using social network data collected in 2022 alongside established findings from the math teacher leadership literature, collaboration will be used as the primary metric of comparison to begin to address a gap in the literature concerning the efficacy of professional development driven teacher leader training as well as the effectiveness of non-full-time teacher leaders.
Exploring the Food Information Network to Assist in Making Healthy and Affordable Purchase Decisions Germino J, Szymanski A While many Americans strive to eat healthy, maintaining an affordable and nutritious diet can be challenging, especially for those living under the conditions of poverty. To fulfill a healthy diet, consumers must make difficult decisions within a complicated food network that incorporates their health goals and budget constraints, their local grocery store’s supply of products and pricing options, and the nutritional guidelines provided by government services. In addition, the information within the food network is often inconsistent and challenging to find, which adds to the difficulty for consumers to make informed, optimal decisions. Challenges are exacerbated for low-income and Supplemental Nutrition Assistance Program (SNAP) households that have additional time and cost constraints impacting their food purchasing decisions and often leaving them more susceptible to malnutrition and obesity. There is a need for applications that will use the food information network to assist users in making informed decisions that support their dietary needs while minimizing cost. In our research, we discover gaps when using this information to make user-centered recommendation systems. We examine these gaps through a case study using an optimization model that explores how healthy diets can be obtained within a limited SNAP budget. We also discuss needs for future design of user-centered optimization models that will allow the food information network to be used in a way that will give consumers agency in finding an affordable and healthy diet.
Families and Schools: An Inter-Institutional Approach to Understanding how Parental Racial Socialization Operates in Schools Kraemer M This study explores how parental racial socialization (PRSOC) operates in schools as racialized organizations. More specifically, by merging child-level data from the Maryland Adolescent Development In Context Study (MADICS) with school-level data from the National Center for Education Statistics (NCES) and the Civil Rights Data Collection (CRDC), I investigate the extent to which engagement in PRSOC affects academic outcomes and educational experiences for Black and White students within public middle schools. First, this study finds a significant effect of PRSOC on academic outcomes and educational experiences, suggesting that engagement in PRSOC has a positive impact on GPA and affect toward school even after controlling for other family- and school-level characteristics. Second, this study evaluates the impact of the racialized school context and finds a significant negative impact on academic outcomes and educational experiences. Further, the specific messages that students receive from their parents about race vary widely in both what they said/did (content) and why (purpose) between White and Black students. These differences may have repercussions on students’ schooling, as Black PRSOC largely serves to protect students from racism, while White PRSOC ultimately maintains and exacerbates racial inequality in schools. The findings above shed light on important cultural mechanisms that help produce, maintain, or challenge racial inequality within schools. Further, school organizational practices matter for student educational experiences and academic outcomes. As such, scholarship needs to be more critical of how race operates in schools, as well as the way that schools support, undermine, or challenge students’ racialized experiences
Food Nutrition Communication in the Online Digital Environment Ciamei L, Flores Z, Hunt R, Nie S, The Food Nutrition Communication in the Online Digital Environment Project aims to answer the question: What can we represent and communicate around a particular food item in the digital environment? The digital medium has significant potential for better representations of nutrition information than the physical world. Now that companies are no longer limited to 2D product packaging to convey this information, they have greater responsibility and new opportunities to present more detailed, interactive, complete, and comprehensible nutrition information to their customers. Our project will create an effective, efficient, and satisfactory food representation in the digital environment that promotes equitable access to accurate information about what consumers purchase and how their decisions impact a balanced and healthy diet. Shoppers should have access to nutritional data that is presented in a helpful, understandable way that supports their pursuit of a healthy, affordable diet. In order to accomplish these goals, the group is following the representation design methodology of Tamara Munzner in her book Visualization Analysis and Design, as well as traditional data science methods for summarizing and standardizing the nutrition data and calculating comparison metrics between items. We will design a representation for the product regarding what we know about it, with a focus on its ingredients and nutritional content. Then, we will develop a representation for comparisons within the food environment, such as nutrition density, the concentration of good or harmful ingredients and prices, and budget-friendly swaps based on sales or allergies. The existence of examples from stores like Walmart and Kroger and baseline representations like the nutrition label will aid in the project’s evolution. The group will develop a digital visual representation of food data that will act as a tool for helping people understand the nutritional content and other attributes of the product. This representation will either represent one food or a comparison of two foods. The group will also deliver the results of a usability study to measure our interface’s efficiency, effectiveness, and satisfaction to judge how well people can comprehend and use nutrition information using our developed tool.
Geometric deep learning for 3-dimensional patient-specific hemodynamic prediction Du P, Wang JX Image-based Computational fluid dynamics (CFD) has been an indispensable tool in clinical cardiovascular diagnosis. However, comprehensive CFD simulation takes a significant amount of time to generate desired hemodynamic quantities, especially under realistic assumptions (e.g., Fluid-structure Interaction (FSI)). Alternatively, deep neural network (DNN)- based surrogate model has been actively researched as its rapid speed greatly facilitates time-critical applications. Currently, the majority of previous work focuses on learning the mapping between input geometry and output fluid fields, where the geometry information is represented by mesh coordinates. This approach is subject to large approximate errors due to non-intuitive data representation and is not rotation-invariant. A more novel approach is to treat the geometry as a graph neural network (GNN) where message passing is directly learned in the training process. We show that our GNN model has great prediction accuracy on a synthetic aorta geometry dataset and is robust with different mesh topologies.
Graph Rationalization with Environment-based Augmentations Liu G Rationale is defined as a subset of input features that best explains or supports the prediction by machine learning models. Rationale identification has improved the generalizability and interpretability of neural networks on vision and language data. In graph applications such as molecule and polymer property prediction, identifying representative subgraph structures named as graph rationales plays an essential role in the performance of graph neural networks. Existing graph pooling and/or distribution intervention methods suffer from lack of examples to learn to identify optimal graph rationales. In this work, we introduce a new augmentation operation called environment replacement that automatically creates virtual data examples to improve rationale identification. We propose an efficient framework that performs rationale-environment separation and representation learning on the real and augmented examples in latent spaces to avoid the high complexity of explicit graph decoding and encoding. Comparing against recent techniques, experiments on seven molecular and four polymer real datasets demonstrate the effectiveness and efficiency of the proposed augmentation-based graph rationalization framework.
Hack(in)g the Past Wang L With cybersecurity increasingly becoming a social, economic, politics as well as military concern, there seems to emerge a need to study hacking cultures and its evolution from a subculture into an entrepreneurial, commercial, political and military actor. This poster presents a short history of Chinese grassroots patriotic hacking cultures from the 1980s to the 2000s. It also preliminarily investigates how digital data collection and analysis could be used for history, especially for the history of the internet which is by nature more ephemeral and more prone to the danger of cherry-picking evidence. It will suggest that data enables us to explore hacking in the past better, as with its help, we are essentially hacking the past.
Interpreting Machine Learning Democracy Measures Peitong R Democratic backsliding—the diminishing of democratic characteristics in political regimes around the world—has become an important issue for social scientists and policy makers. Comprehensive democracy indices like the Variety of Democracy (V-Dem) have been compiled based on expert conceptual framework and careful data collection efforts to measure “democracy” and to explain how dimensions of democracy interact. However, when mapping multidimensional indicators onto a single level of democracy, these aggregated indices cannot convince users of the reason behind additive or multiplicative aggregation procedures. Machine Learning (ML) measures of democracy avoid arbitrary aggregation methods while achieving desirable predictive powers. Nevertheless, they cannot link distinct socio-political factors to certain democracy levels and, therefore, contribute little to social scientists’ theory-testing or politicians’ decision-making. Under the framework of explainable AI, this project selects features among the V-Dem indicators to explain Grundler and Krieger (2021)’ s Support Vector Machines democracy index in an effort to bridge the gap between the non-interpretable ML process and V-Dem’s contested aggregation procedures. The project compares Subset Selection, Shrinkage, and Dimension Reduction by Sure Independence Screening for model selection and evaluates the selected models on multicollinearity and interpretability. The project shows how well the model selection methods perform on utilizing the V-Dem variables to explain the black box of the ML index. The result can help social scientists select V-Dem variables for testing theories of democratic regime transitions.
Leveraging Remote Sensing for the Detection of Schistosomiasis Intermediate Host Habitat Forstchen M Schistosomiasis is a neglected tropical disease with over 200 million people currently infected. Caused by Schistosoma, a genus of trematodes, the life cycle of schistosomiasis incorporates an intermediate freshwater snail host before a definitive human host is infected. These snail hosts have been shown to have a strong mutualistic relationship with Ceratophyllum, a submerged aquatic vegetation, suggesting this habitat could be used as a proxy for snail location. Thus, being able to detect the vegetation via remote sensing platforms provides a potential avenue to understand disease transmission hotspots. In 2022, drone-acquired multispectral imagery was collected at 17 villages and 40 water access points in the Senegal River Basin in Senegal. Initial analyses following an object-based image analysis framework, which utilizes multiple segmentation suggests that Ceratophyllum may be detectable and distinguishable from other genera of aquatic vegetation using remote sensing platforms.
Migration Experiences, and Women’s Perceived Impact of Migration on Community Life in Lima, Perú: Associations with Adverse Childhood Experiences, Intimate Partner Violence, and Mental Health Gargano M C, DiBiase C E, Miller-Graff L E To support wellbeing and resilience in contexts of migration, a deeper understanding of the experiences of migrants and receiving communities is required. The present study uses qualitative and quantitative data drawn from a sample of pregnant women (N=251) in Lima, Perú, 87 of whom reported being internal migrants and 163 of whom reported being from Lima. The goals of the study are: 1) better understand women’s internal migration experiences, 2) views of the impact of migration on the local community, and 3) associations between migration attitudes, violence, and mental health. Diverse methodological approaches, including thematic analysis, quantitative methods, and natural language processing techniques, are applied in parallel. The thematic analysis has yielded 6 themes for migration motivations, 8 themes related to adjustment and 10 codes for migration challenges. The completed migration and community codebook (k = 0.907) consists of subcodes for different perceptions of the impact of migration on women’s communities including 5 for positive, 4 for neutral, 7 for negative, and 2 for mixed perceptions. The study also explores how sentiment towards migration is associated with mental health and experiences of intimate partner violence within and across migrants and nonmigrants as reported on quantitative survey assessments.
PaTAT: Human-AI Collaborative Qualitative Coding with Explainable Interactive Rule Synthesis Gebreegziabher S, Zhang Z, Tang X, Meng Y, Glassman E, Li T The use of AI assistance in data annotation has made significant progress. However, qualitative coding in thematic analysis, as a specific type of annotation task, has unique characteristics that make effective human-AI collaboration difficult. Informed by a formative study, we designed PaTAT, a new AI-enabled tool that uses an interactive program synthesis approach to learn flexible and expressive patterns of user-annotated codes in real-time as users annotate the data. To accommodate the ambiguous, uncertain, and iterative nature of thematic analysis, the use of user-interpretable patterns allows users to understand and validate what the system has learned, make direct fixes, and easily revise, split, or merge previously annotated codes. This new approach also helps human users to learn data characteristics and form new theories in addition to facilitating the ``learning’’ of the AI model. PaTAT’s usefulness and effectiveness were evaluated in a lab user study.
PDYi Spesia T, Carnes K, Gerard P, Burgis W Our problem is that the PDYi is made up of data that does not meet the standards of Linked Open Data and FAIR principles leading to the data being difficult to work with and research being difficult to replicate. The data is also not organized in a standardized way meaning that the data is not always machine-readable. We will help scope and sketch the redevelopment plans for this resource. This will allow for more efficient research after redevelopment and allow for researchers to more easily claim credit for their work. Overall this project will lead to the improvement of the system for all users, everyone from young students to advanced academics.
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data Zhang Z Audio-visual learning seeks to enhance the computer’s multi-modal perception leveraging the correlation between the auditory and visual modalities. Despite their many useful downstream tasks such as video retrieval, AR/VR, and accessibility, the performance and the adoption of existing audio-visual models have been impeded by the availability of high-quality datasets. Annotating audio-visual datasets is laborious, expensive, and time-consuming. To address this challenge, we design and develop an efficient audio-visual annotation tool called PEANUT. PEANUT’s novel human-AI collaborative pipeline separates the multi-modal task into two single-modal tasks, and utilizes state-of-the-art object detection and sound-tagging models to reduce the annotators’ effort to process each frame and the number of manually-annotated frames needed. A within-subject user study with 20 participants found that PEANUT can significantly accelerate the audio-visual data annotation process while retaining high accuracy.
S&P Playground: Optimizing Users’ Security and Privacy Settings in a Risk-free Environment Based on Personal S&P Knowledge Graphs Chen C Despite the sustained effort from cybersecurity practitioners, user adoption of security and privacy (S&P) practices is still lower than expected due to the misalignment between the intervention with user needs. Prior work focused on designing various S&P messages, explaining the severity of the S&P threat, and improving users’ S&P awareness. However, how to model users’ S&P decision-making based on personal knowledge remains to be explored. In this paper, we propose S&P Playground, where users can tweak the S&P settings and see their consequences without putting themselves at real risk. Knowledge graphs (KGs) are prevalent in incorporating human knowledge into AI models. When users interact with the playground, the system will ground users’ interaction events with existing S&P ontologies through KGs, which assists in finding patterns and reusable solutions for recurring S&P issues. Therefore, building KGs through S&P Playground will ultimately facilitate AI models in generating more explainable and context-aware S&P settings.
Shaping Peaceful Futures: A Case Study in Youth Peace Leadership in South and Southeast Asia Hedden W Peace leadership is an emerging subject of practical and theoretical inquiry situated at the intersection of peace studies and leadership studies. This action research presents a case study of a peace leadership community of practice in South and Southeast Asia. The research offers rich descriptions of peace leadership practices from populations underrepresented in peacebuilding and leadership literature, specifically youth and women from developing Asian countries. The findings of this research suggest peace leadership can function as an important space to accelerate and promote alternative leadership approaches from underrepresented communities and demographics.
Towards Effective Multi-Modal Human-AI Collaboration on Audio-Visual Data Ning Z, Zhang Z, Ban J, Tian Y, Li T Audio and visual information provide complementary information for humans and AI agents when interacting with multi-modal data. Specifically, AUDIO, especially spatial audio, is important for localizing sound sources and increasing people’s immersive feeling; VISUAL helps identify objects, understand their spatial relationships, track their movements, etc.

In our series of projects, we explore building multi-sensory human-AI collaborative systems to meet actual user needs in different areas. Here we present two projects:1) MIMOSA: A human-AI co-creation system for generating spatial audio effects on videos; 2) BISCUIT: A multi-modal video scene exploration system for users with visual impairments.
Understanding Developer-AI Collaboration: A Behavioral and Cognitive Modeling Approach Tang N, Chen M, Ning Z, McMillan C, Li T. GitHub Copilot 1 is a AI pair programmer that generate code snippets from code contexts. Our project is aimed at understanding programmer behavior when they program in collaboration with GitHub Copilot, including IDE behavior (direct) and eye tracking behavior (indirect).
Unlocking Social Determinants of Health & Wellbeing for Equitable Health & Healthcare Access in LMIC Wagner M, Calderon A, Robles J

In low- and middle-income countries, proper medical information is scarce, diagnosis is imprecise, and treatment is inaccessible. Previously, a team of Notre Dame faculty and students partnered with the Hospital Infantil de México Federico Gómez (HIMFG), a clinical oncology hospital in Mexico City, to address this issue. So far, the team has developed an app for the hospital’s data processes to aid with organization and research. For our project, we wish to continue this work and focus on addressing one of the key factors involved in Febrile Neutropenia in low-middle income countries: the mental health of the patients and their primary caregivers. 

Patients and their families face numerous challenges throughout the child’s journey with cancer, including emotional, cognitive, behavioral, and physical health issues (St. Jude). Additionally, there are multiple psychosocial and sociodemographic factors associated with anxiety in family caregivers of children with chronic diseases. Such factors include caregiver burden, quality of life, family functioning, health problems in daily life, parental stress, and depression (Toledano, 2018). The struggles to manage these factors deteriorate the health outcomes for the family, as well as the patient. Throughout the scope of our project, we want to focus on how we can explore, identify, and address the role of mental health within the journey of a low-income child’s cancer treatment and connect the child’s family to resources that will lessen the impact.

User or Labor: An Interaction Framework for Human-Machine Relationships in NLP Wan R, Etori N, Badillo-Urquiola K, Kang D The bridge between Human-Computer Interaction and Natural Language Processing has developed quickly in the past couple of years. Yet, there is still a lack of formative guidelines to understand the Human-Machine interaction in the NLP loop. When researchers cross between the two fields, they often talk about humans as being either a user or labor. Humans as a users typically means the human is in control and the machine is used as a tool to achieve the human’s goals. When humans are described as labor, typically the machine is in control and the human is used as a resource to achieve the machine’s goals. Through a systematic literature review and thematic analysis, we present an interaction framework for understanding human-machine relationships in NLP. Our framework conceptualizes four types of human-machine interactions: 1) Human-Teacher and Machine-Learner, 2) Machine-Leading, 3) Human-Leading, and 4) Human-Machine Collaborators. Through this poster presentation, we will demonstrate how these interactions change across tasks as the relationship between the human and the machine develops. We also discuss the implications of this framework and how it can be leveraged for the future of NLP and human-machine relationships.
Using Data from Atomistic Dances to Solve Grand Technological Challenges for Societal Benefit Agbodekhe B, Zhang Y, Maginn E All matter from the nanoscale and above are made up of atoms. Atoms are not the smallest units of matter, however, for a plethora of practical purposes, we can assume that atoms are the smallest units of matter that we need to deal with. A basic idea in scientific practice is that the behavior of a whole can be probed from the behavior of its component parts. The kinetic theory of matter confirms that atoms in all matter are always moving, more like dancing. If we could sufficiently model this dance of atoms by generating ensembles of the positions and velocities of all atoms in matter over time, we could potentially elucidate all the properties and behavior of matter from as small as the tenth of a nanometer to as large as the galactic scale and beyond. Molecular Dynamics (MD) simulations is a tool with which we model and simulate these atomistic dances. However, the output of an MD simulation is a large data of atomic positions, velocities, and other supplementary data on the atoms for each unit of time. These data which spans several gigabytes for an average MD simulation is not of much use in its raw form. These data must be subjected to post-processing using sophisticated tools of Statistical Mechanics which draws on some of the core ideas of statistics and probability for data analysis and transformation of microscopic data to macroscopic information. One of the possibly thousands of current technological challenges being addressed using these tools is the design of materials and processes for the separation of environmentally harmful refrigerant mixtures into their component parts to repurpose these harmful mixtures and prevent them from causing more harm to our world.
Using Data Science to Protect Tap Water Quality Nerenberg R, Lemmon, M D, Sisk M, Clements E, Duan Y

Water utilities capture raw water, treat it to EPA standards, and distribute it to users via a piped network. While utilities must comply with EPA standards up to the user’s connection, conditions in residential distribution networks can greatly degrade water quality. Tools are needed to identify homes at risk for water quality problems and to develop community wide strategies for mitigating these problems in a fair and equitable manner. 

We proposed a Fair Federated Learning framework in which several local communities collaboratively learned a shared global model while keeping all training data locally. The framework can be applied to predict whether a given residence needs mitigation of their home’s water risk  based on the residence’s profile. It will  guarantee equitable mitigation policies across all neighborhoods. Preliminary results on UCI Adults dataset demonstrated classifiers trained under our framework can perform fairly respect to different communities with a small cost of accuracy.


Call for Student Posters (Enter by Oct 3 – DEADLINE IS PAST

  • We welcome poster competition entries from undergraduate and graduate students in all disciplines.
  • The student poster session will open the symposium during the lunch hour from 12-1.
  • Live/crowd-sourced voting during the event will determine the winners. 
  • Awards for the top three posters($300, $200, $100) will be announced at the evening reception at Foley’s. 
  • Posters may be submitted by individual students or groups of co-authors.

Posters that feature project concepts, new or continuing research in data science, AI, data engineering, computing, applications, and methods that amplify societal expertise in areas of human development, peace accord, ethics, global development, health disparities, and poverty are particularly welcome. 

Important Dates:

  • Sep 15 Student Registration Opens

  • Oct 3  Student Registration & Poster Entry Deadline

  • Oct 5  Student Poster Competition Entries Announced & Abstracts posted

  • Oct 6 5 PM Poster Print Job Submission Deadline; please email submissions to

  • Oct 6 5 PM (optional)  Self-Printed posters must be delivered to 610 Flanner Hall

  • Oct 11 Lucy Family Institute Fall Symposium, Please Arrive a little before noon to find your poster and pick up your badge.

Students interested in presenting a poster at the Symposium can enter the competition using the student symposium registration form.

To enter the student poster competition, for posters with a single author, include on your symposium registration your poster’s title, a brief abstract describing your poster theme in 250 words or less, and submit your name as you’d like it to appear on the poster competition voting system the day of the event.   

Group Entries in the poster competition: For group entries, one student, acting as the corresponding author, should submit on their symposium registration, the poster’s title, a brief abstract, and a list of all the groups’ authors’ names as you’d like them to appear on the poster competition voting system the day of the event.   If you are part of a group submitting a poster to the competition, please know everyone in the group who is planning to attend the symposium (regardless of whether you are the corresponding author) should register for the symposium and indicate their dietary restrictions. 

Have Additional Questions about Group Registration? We want to hear from you, please contact  or call 574-631-7095 for assistance entering the poster competition. 

All poster competition entrants are welcome to use the event’s optional poster template if you would like the Institute to print the poster.  We are facilitating poster printing at no cost to students for those that use the template to ensure ease of participation and email your submission to by October 6th. Students are also free to print their own as long as they deliver their printed poster by the deadline of Oct 6th at 5 PM.

Accommodations: We want the event to be engaging for all. We’ve anticipated certain accommodations to enable participation (diet, allergy, interpreter, lactation room) but we know special circumstances may arise. Please contact Lucy Institute if you have any questions about student registration, the poster competition, or to ensure we can accommodate your needs on the day of the event. Phone 574-631-7095  or email

Latest News

More News

Upcoming Events

More Events