Automated Discourse Analysis of Reasoning Patterns in Science Classrooms

This project investigates automated classification of reasoning components (RC) and utterance types (UT) in science classroom dialogue using LLM-based data augmentation and fine-tuned transformer models. By probing students' reasoning patterns with a revised 4-class RC scheme (ER, SR-D, SR-I, NA), the project analyzes co-occurrence and sequential discourse patterns, tracks cognitive complexity over lesson time, and identifies instructional moves associated with higher-order student reasoning.

This project investigates automated classification of reasoning components (RC) in science classroom dialogue — a subproject within the broader ADAS initiative — using fine-tuned transformer models and LLM-based data augmentation.

Classroom discourse encodes rich information about students’ cognitive engagement, but manual coding is labor-intensive and difficult to scale. This project addresses both the modeling and analytical challenges of discourse coding at scale.

Research Questions:

  1. To what extent can LLM-based data augmentation and semi-supervised learning improve utterance-type and reasoning-component classification accuracy?
  2. What co-occurrence and sequential patterns of utterance types and reasoning components characterize science classroom discourse?
  3. How does students’ cognitive complexity evolve over a lesson, and which teacher moves are associated with shifts toward higher-order reasoning?

Team: Raymond Carl, Mukhesh Raghava Katragadda, Dr. Soon Lee, Dr. Jiho Noh