Q+A With Khan Academy’s Kristen DiCerbo

In a moment when AI is reshaping how students learn and teachers teach, few voices are as influential as Dr. Kristen DiCerbo, Chief Learning Officer at Khan Academy. Recently, ETS sat down with DiCerbo to explore how evidence-based learning design, emerging technologies, and a commitment to educational equity are coming together to shape the future of personalized instruction.

In this conversation, DiCerbo offers a rare, behind-the-scenes look at what meaningful innovation in education really requires. She digs into what’s working, what still needs to be solved, and how educators can navigate this transforming landscape with both optimism and clarity.

As Chief Learning Officer at Khan Academy, you’ve been at the forefront of integrating AI into learning experiences. What excites you most about using behavioral signals to measure skills beyond traditional assessments?

DiCerbo: I actually think what AI offers us is perhaps not behavioral signals, but new activities. We have been working on using behavioral signals in assessment for more than a decade with evidence from simulations and games. I would say the most exciting thing about generative AI and assessment is that it allows for new kinds of interactions. For example, students can have conversations with AI that mimic real world conversations. They can also generate visual output in ways that were never possible before.

Why do you think now is the right time to rethink how we measure competencies like collaboration and persistence?

DiCerbo: The ability to have new conversational types of interactions opens up more authentic ways to assess constructs like collaboration and communication. For example, if we wanted to assess persuasion, individuals could have conversations with an AI to persuade them of a stance. Prior to generative AI, conversations in assessment were not possible. Take the 2015 PISA assessment of collaborative problem solving. In order to simulate collaborative problem-solving dialogues, the test creators had to use multiple choice selection where test-takers chose which option to “say” next. This significantly constrained the possible solution space for test takers and obviously made the experience much less like an actual problem-solving conversation. Now, with generative AI, we have the possibility of students engaging in conversations as they would with humans to demonstrate their skills. Of course, this takes significant effort, including things like attempting to guide the AI’s responses to the students’ input.

Regarding persistence particularly, I see that differently than the above constructs. Persistence is essentially about observing whether someone continues to try in the face of failure. We have been able to observe that in digital environments for at least the past decade (as I wrote about here in 2016).

Are there opportunities to incorporate multimodal data, like voice or gesture, into assessments? What challenges or ethical considerations come with that?

DiCerbo: In launching Khanmigo, Khan Academy’s AI-powered tutor for students and assistant for teachers, our text-to-speech and speech-to-text features have been well-received, particularly as ways to reduce reading and typing burdens. As we move into assessment, the challenge in including voice or gesture will be to avoid bias in scoring.

Where do you see the greatest promise in using AI and behavioral data for skill measurement and what limitations should educators keep in mind?

DiCerbo: We are excited to have piloted a feature called “Explain Your Thinking” with about 8,000 students over the past year. Students engage with a traditional math question and then engage in a dialogue with generative AI in which they are asked to explain the reasoning behind their response. The activity is meant to mimic what teachers do when they sit next to a student and ask about their work. Like previous research done at ETS, we found that students reveal more about their understanding in these scenarios than they do from just entering a response. This means that teachers and other stakeholders get more insight into what students know and can do.

How do you balance the depth of insights from these innovative approaches with the need for scalability and practicality in classrooms?

DiCerbo: As with many things in assessment, innovation is best begun in the formative space where consequences for things like increased error of measurement are small. If a student spends a little time practicing something they have already mastered because an assessment indicated they had not mastered it, that is not a fatal error. Classroom assessments with generative AI can be created by instructors fairly easily, as this professor did creating oral exams for his class.

Looking ahead, what role do you see for AI in creating assessments that feel authentic and culturally responsive?

DiCerbo: We need more research on whether the personalization that may be possible with generative-AI-powered assessments results in more valid and reliable assessments. It is certainly the case that the inclusion of construct-irrelevant background knowledge can result in lowered validity for some test-takers. It is possible that using generative AI, assessment items and activities could be adjusted to consider individual students’ experiences, language, and cultural understanding. However, doing this while maintaining standard definitions of the construct being assessed is not a simple task.

What research or innovations are you most excited about in the next few years for measuring real-world skills through behavior?

DiCerbo: I put innovations in a few buckets. Here is what I am excited about.

Technology that exists but we have not optimized for assessment yet:
- Agentic AI - allow for the separation of different parts of the assessment process to be accomplished by specialized agent
- Large context windows - supplying the AI with large amounts of information can help with context-rich feedback and scoring with complex rubrics
Technology available in the next 12 months:
- Affordable text, audio, and video streaming - allowing both the test-taker and the AI to interact in multiple ways, like this demo from Sal and his son
- Explainable AI - understanding that AI reasoning will better support applications like scoring, where “black box” scores are not helpful in providing feedback to learners
- Privacy-aware on-device models - will address concerns about data sharing and privacy
Technology available in the next 1-3 years:
- Multi-agent simulations - test-takers interact with multiple AIs that play different roles in the assessment, simulating real world group scenarios

{"teaserCardGridModuleHeader":"Insight Drives Progress","teaserCardGridModuleDescription":"Discover the research, stories and ideas moving education, work and human potential forward.","teaserCardGridModuleTheme":"ets-xdark","showSeparator":true,"teaserCards":[{"teaserCardTitle":"Discover AI at ETS","teaserCardDescription":"Learn about our AI vision, principles and solutions - and how we’re empowering our workforce with real-world AI skills.","teaserCardImage":"/content/dam/ets-org/brands/insights-and-perspectives/ai.png","teaserCardImageAlt":"Image 1","teaserCardLink":"/ai.html","enableGatedContent":false,"ctas":[]},{"teaserCardTitle":"Human Progress Report","teaserCardDescription":"See how ETS’s mission comes to life through people and impact. These are stories of transformation, opportunity, and progress in action.","teaserCardImage":"/content/dam/ets-org/Rebrand/Photos/insights-teaser-card-image-1.webp","teaserCardImageAlt":"Image 2","teaserCardLink":"/human-progress-report.html","enableGatedContent":false,"ctas":[]}],"ctas":[]}