DIAL Research

Interpretable Models

Although its definition remains a bit ambiguous in machine learning literature, interpretability is most often embodied by heuristics such as model size, simplicity, human simulatability, and, above all, the extent to which domain experts can explicitly understand the patterns expressed by a model.

Click the title of the project to view the description.

Interpretable Modeling with Symbolic Regression

Interpretability is especially important for high-stakes settings, such as healthcare, in which users need to trust model predictions, maintain transparency to patients and stakeholders regarding decision-making, and design better processes geared toward improving patient outcomes. Our recent publications on the topic of interpretability include a tutorial on symbolic regression in the context of human health, in which we use QLattice Symbolic Regression software to find explicit mathematical functions to estimate body composition as alternatives to existing heuristics, such as body mass index (BMI), using accessible body measurement variables collected by the Center for Disease Control and Prevention (CDC).

Health & Wellness

Faced by enormous health care costs and an unsustainable system, more efficient medical practices are needed. Our work addresses this problem from both ends of the healthcare informatics spectrum. On one end, we have focused on the development of analytical models and statistical analysis ranging from lowest levels of personalized care, clinical data, to the highest level of population data to gain additional insights and perspectives into a clinical environment. On the other end of the spectrum our work focuses on the development of technologies aimed understanding technology’s role in addressing community based health and wellness problems.

Click the title of the project to view the description.

Pediatric Cancer

Low- and middle-income countries (LMICS) are predicted to contain two thirds of the world’s cancer incidence by 2024. Mexico, in particular, has shown an especially high incidence of pediatric cancer rates, projecting 6,778 new cases by 2025, making it the 2nd highest incidence in Latin America, according to the World Health Organization (WHO). These statistics have motivated our collaboration with the Hospital Infantil de México Federico Gómez, a pediatric oncology institution in Mexico City who primarily serves disadvantaged populations with limited access to health services. Due to a lack of electronic medical records (EMRs), we have designed and built a dual web and mobile application to track clinical information in combination with social determinants of healthcare access from hospital staff and patient caregivers throughout patients’ oncology trajectories, both in the hospital and in the home setting. This data will be used to build models for predicting adverse outcomes (e.g. febrile neutropenia, septic shock, bacteremia, bleeding, death) in this vulnerable population, adding onto our preliminary modeling work regarding a subset of these adverse events. The goal is to learn the meaningful relationships between these variables, which may illuminate pathways for better care, treatment, and service to these patients and families.

Childhood Obesity

There are many isolated interventions dealing with childhood obesity today. Some focus on educating children, while others focus on getting kids active. This work, however, aims to use a collective impact intervention to unite these different areas. As a pioneer in this type of collective impact programming, the United Way aims to leverage many different community programs, some previously validated, such as CATCH, and others new and upcoming, such as prescription to play. Our work is to create a social wellness platform that will allow children to set and track wellness goals, as well as provide them feedback for progress and information pertaining to their specific interests. The ability to monitor progress is central, with the goal of showing users information on their improvement, not just their successes or failures. The platform also encourages users to join controlled social groups within classes and friends to challenge each other for improved performance and to reinforce positive behaviors.

Diabetes Risk and Management

Chronic diseases such as diabetes take a great deal of personal commitment and awareness to manage effectively. We understand that every individual is unique, and there may be many causes for these difficulties. However the current practice of retroactively treating this issues is both expensive and less effective than early action treatment. However we understand that in the challenging healthcare environment today creating wide spread interventions for all diabetic patients is not a practical solution. We believe that through the integration of technology and data mining into patient care we can augment the move away from this reactive paradigm to a preventative care model. Through a combination of personalized features we aim to identify those individuals at high risk for management issues. We then intend to determine a personalized course of action based on the resources available to that individual.

Population-Level Analysis

Utilizing population-level data we have undertaken a higher-level data science analysis drawing on the Center for Medicare and Medicaid Service (CMS) national public physician dataset. We aimed to open the discussion into how data from multiple sources, such as the CMS Medicare release, existing CMS datasets as well as additional external public data can be utilized to generate insights into new and interesting questions around clinical practice. This work focused on the concept of knowledge transfer and how experiences during education can shape a physician over the course of their career, posing the question: does a physician’s past experience in medical school shape their practicing decisions?


As healthcare becomes increasingly digitalized, we have been working to blend technology with society by developing a healthcare application that can help seniors live better. Our tablet-based application, aimed at enhancing the physical health, vitality, and brain fitness of seniors residing in independent living communities, is a patient-centric framework for medication, nutrition, and pain management designed specifically for senior patients. To help patients manage chronic diseases, the application provides alerts for daily medications and information on medical appointments. The application can also be used as a medium to provide community health workers with discharge summaries. In collaboration with a local Aging in Place program, we have been conducting a study of the application and its effects on senior well-being. Through the study, we investigate conditions indicative of risks or trends in patient health, including questions relating to exercise, diet, mood, and sleep patterns.

Online Health and Wellness Information Consumption

Users are rapidly leveraging the Internet as a viable source of health information. In this research, we study the health-seeking behavior of users on a national health and wellness-based knowledge sharing online platform. We begin by identifying the topical interests of users from different content consumption sources. Using these topical preferences, we explore information consumption and health-seeking behavior across three contextual dimensions: user-based demographic attributes, time-related features, and community-based socioeconomic factors. We then study how these context signals can be used to infer specific user health topic preferences. Our findings suggest that linking demographic features to user profiles is more effective in predicting health preferences than other features. Our work demonstrates the value of using contextual factors to characterize and understand the content consumption of users seeking health and wellness information online.


The NetHealth Study is exploring the extent to which healthy behaviors can be promoted through social networks. This is currently being conducted by using smartphones to gather information on people’s social networks and Fitbit activity trackers to gather information on people’s physical activity and sleep patterns. Over seven hundred Notre Dame students are currently enrolled in the study who entered as first-years in the 2015/16 academic year.


MomLink is a research project consisting of a web and mobile application that will help first-time moms access pregnancy-related educational resources and acquire timely and personalized information related to their pregnancy. The application will also allow them to communicate directly with their prenatal care coordination team, receive information, and track their progress.

Pandemic Forecasting

The spread of COVID-19 throughout the world has led to cataclysmic consequences on the global community, which poses an urgent need to accurately understand and predict the trajectories of the pandemic. In epidemiology, mass human mobility data (i.e., how many people moved from one place to another in a given period) has demonstrated its predictive power as infectious diseases are spread through human-to-human transmission. Due to the natural graph structure of human mobility data, various graph neural networks (GNNs) have been proposed to predict pandemic trajectories. For more details, read about a recent paper introducing a hierarchical spatio-temporal GNN for better forecasting of the pandemic.