Automated Discourse Analysis of Reasoning Patterns in Science Classrooms

This project investigates automated classification of reasoning components (RC) in science classroom dialogue — a subproject within the broader ADAS initiative — using fine-tuned transformer models and LLM-based data augmentation.

Classroom discourse encodes rich information about students’ cognitive engagement, but manual coding is labor-intensive and difficult to scale. This project addresses both the modeling and analytical challenges of discourse coding at scale.

Research Questions:

To what extent can LLM-based data augmentation and semi-supervised learning improve utterance-type and reasoning-component classification accuracy?
What co-occurrence and sequential patterns of utterance types and reasoning components characterize science classroom discourse?
How does students’ cognitive complexity evolve over a lesson, and which teacher moves are associated with shifts toward higher-order reasoning?

Team: Raymond Carl, Mukhesh Raghava Katragadda, Dr. Soon Lee, Dr. Jiho Noh