Agenda & Abstracts: Trustworthy AI Lab for Education Summer Online Symposium

The agenda details are provided below.

Please note that all times are tentative, and are in Eastern Time (ET).


June 12, 2024

This symposium will be held online. To register, please visit this link.

2:00 – 2:30 pm “AI in Computer SChatGPT vs. Machine Learning: Assessing the Efficacy and Accuracy of Large Language Models for Automated Essay Scoringcience Education: A Systematic Review of Empirical Research in 2003-2023

Youngwon Kim, Harvard University

This study compares the efficacy and accuracy of Large Language Models (LLMs) to tree-based Machine Learning (ML) algorithms in the context of student essay grading. Using essays from the Catalyzing Comprehension through Discussion and Debate project, we evaluate both essay stance classification (categorical) and writing quality (continuous) against a human-scored standard. LLMs, with various prompting and fine-tuning techniques, are compared to ML algorithms trained on extensive statistical features. Our findings show that while LLMs have potential, particularly in essay classification, tree-based ML methods currently offer superior accuracy for assessing writing quality. In contrast, most ChatGPT versions outperformed the majority of ML methods in essay classification, with the exception of extreme gradient boosting. Our findings demonstrate the importance of prompting and fine-tuning techniques and highlight strengths and limitations of LLM and ML approaches in automated essay scoring. This study is useful for educators, researchers, and policymakers looking to use technology for more efficient and accurate essay grading.

2:30 – 3:00 pm “Towards a future with robust explainable AI in education”

Juan D. Pinto, Department of Curriculum and Instruction, University of Illinois at Urbana-Champaign

Our previous work has highlighted the urgent need for AI in education to embrace explainable models and methods (Pinto & Liu, 2023). As the pace of AI development and awareness continues to pick up, we see this as critical for instilling justified trust in such technologies. Specifically, we have called for (1) the establishment of a unified vision for explainable AI (XAI) in education, (2) greater awareness of the complexities of XAI, including the problematic limitations of post-hoc methods, (3) research into possible approaches for increasing model interpretability, and (4) the development of explainability evaluation methods.

To this end, we will be holding a workshop at the 2024 International Conference on Educational Data Mining (EDM). The Human-Centric eXplainable AI in Education (HEXED) Workshop1 aims to bring together researchers and practitioners from the fields of machine learning and education to discuss the challenges and opportunities of using interpretable machine learning models in education research. In line with our goal of creating a unified vision for XAI in education, an important part of this workshop will be a series of working groups to identify, frame, and connect the key issues in the field, culminating in a visual representation and manuscript to be used as a guide for future research.

We have also conducted some work into a specific method for creating neural-network-based learner models that are interpretable by design (for an early pilot of this work, see Pinto et al., 2023). We are at the stage of beginning to evaluate their level of interpretability using a simple questionnaire that measures how well participants are able to predict and alter the model’s predictions. As part of this effort, we are proposing a unified framework for evaluating explanations. We hope to be presenting and expanding this framework soon.

3:00 – 3:30 pm “Early Grade Prediction and Validation to Support Students in a Foundational STEM Course”

Thomas Joyce, Department of Applied and Computational Math and Statistics, University of Notre Dame

In foundational STEM courses like organic chemistry, accurately predicting students’ final performances early in the semester is crucial for identifying and supporting at-risk students. Previously, we used a performance group framework to analyze student learning outcomes in an introductory organic chemistry course at a mid-sized private university, classifying students into Thriving, Succeeding, and Developing performance groups based on their final course grades. We employed linear stepwise regression and ordinal random forest models to predict students’ final exam grades and final performance groups, using first-attempt quiz scores and exam scores before the midterm break as predictor variables. Our models demonstrated high predictive accuracy by the seventh week of the course. However, this analysis used data from a single semester, necessitating further testing and validation. Here, we apply the statistical and machine learning models developed with Spring 2023 course data to predict students’ grades in Spring 2024. Additionally, we incorporate student survey data from assessment wrappers to evaluate the fairness of our models. Based on these findings, we refine our models to optimize fairness and accuracy. Our models have the potential to be implemented and improved in future semesters of organic chemistry, enabling instructors to promptly assist students in each performance group through early grade prediction and reliable AI methods.