soroban-abacus-flashcards/apps/web/.claude/BKT_DESIGN_SPEC.md

19 KiB
Raw Blame History

Bayesian Knowledge Tracing (BKT) Design Specification

Overview

This document specifies the implementation of Conjunctive Bayesian Knowledge Tracing for the soroban practice system. BKT provides epistemologically honest skill mastery estimates that account for:

  1. Asymmetric evidence: Correct answers prove all skills; wrong answers only prove ≥1 skill failed
  2. Multi-skill problems: Probabilistic blame distribution across co-occurring skills
  3. Uncertainty quantification: Confidence intervals on mastery estimates
  4. Staleness indicators: Show "last practiced X days ago" separately (not decay)

Architecture Decision: Lazy Computation

Key Decision: BKT is computed on-demand when viewing reports, NOT in real-time during practice.

Why:

  • No new database tables needed
  • No hooks into practice session flow
  • Can replay SlotResult history to compute BKT state
  • Easy to change algorithm without migration
  • Can add user controls (confidence slider, priors toggle) dynamically
  • Estimated computation time: ~50ms for full report

How it works:

  1. User opens Skills Dashboard
  2. Dashboard fetches recent SlotResults (already stored in session_plans)
  3. Pure functions replay history to compute BKT state for each skill
  4. Display results with confidence indicators

The Problem We're Solving

Current approach (naive):

accuracy = correct / attempts  // Treats both signals as equivalent

Why it's wrong:

  • Correct: Strong evidence ALL skills are known
  • Incorrect: Weak evidence that ONE OR MORE skills failed (we don't know which)

BKT approach:

  • Maintain P(known) per skill with proper Bayesian updates
  • Distribute "blame" for errors probabilistically based on prior beliefs
  • Report uncertainty honestly

1. Data Source

Existing Data (No Schema Changes Needed)

We already have all the data we need in session_plans.results:

// From src/db/schema/session-plans.ts
export interface SlotResult {
  slotIndex: number;
  problemIndex: number;
  problem: GeneratedProblem; // Contains skillIds
  isCorrect: boolean;
  timestamp: number;
  responseTimeMs: number;
  userAnswer: number | null;
  helpLevel: 0 | 1; // Boolean: 0 = no help, 1 = used help
}

The problem.skillIds field tells us which skills were involved in each problem.

Data Fetching

Already implemented: getRecentSessionResults(playerId, sessionCount) in session-planner.ts


2. BKT Algorithm (Pure Functions)

2.1 Core BKT Update Equations

// src/lib/curriculum/bkt/bkt-core.ts

export interface BktParams {
  pInit: number; // P(L0) - prior knowledge
  pLearn: number; // P(T) - learning rate
  pSlip: number; // P(S) - slip rate
  pGuess: number; // P(G) - guess rate
}

export interface BktState {
  pKnown: number;
  opportunities: number;
  successCount: number;
  lastPracticedAt: Date | null;
}

/**
 * Standard BKT update for a SINGLE skill given an observation.
 *
 * For correct answer:
 *   P(known | correct) = P(correct | known) × P(known) / P(correct)
 *   where P(correct | known) = 1 - P(slip)
 *   and   P(correct | ¬known) = P(guess)
 *
 * For incorrect answer:
 *   P(known | incorrect) = P(incorrect | known) × P(known) / P(incorrect)
 *   where P(incorrect | known) = P(slip)
 *   and   P(incorrect | ¬known) = 1 - P(guess)
 */
export function bktUpdate(
  priorPKnown: number,
  isCorrect: boolean,
  params: BktParams,
): number {
  const { pSlip, pGuess } = params;

  if (isCorrect) {
    const pCorrect = priorPKnown * (1 - pSlip) + (1 - priorPKnown) * pGuess;
    const pKnownGivenCorrect = (priorPKnown * (1 - pSlip)) / pCorrect;
    return pKnownGivenCorrect;
  } else {
    const pIncorrect = priorPKnown * pSlip + (1 - priorPKnown) * (1 - pGuess);
    const pKnownGivenIncorrect = (priorPKnown * pSlip) / pIncorrect;
    return pKnownGivenIncorrect;
  }
}

/**
 * Apply learning transition after observation.
 * P(known after learning) = P(known) + P(¬known) × P(learn)
 */
export function applyLearning(pKnown: number, pLearn: number): number {
  return pKnown + (1 - pKnown) * pLearn;
}

2.2 Conjunctive BKT for Multi-Skill Problems

// src/lib/curriculum/bkt/conjunctive-bkt.ts

export interface SkillBktRecord {
  skillId: string;
  pKnown: number;
  params: BktParams;
}

export interface BlameDistribution {
  skillId: string;
  blameWeight: number; // Higher = more likely this skill caused the error
  updatedPKnown: number;
}

/**
 * For a CORRECT multi-skill answer:
 * All skills receive positive evidence (student knew all of them).
 * Update each skill independently with the correct observation.
 */
export function updateOnCorrect(
  skills: SkillBktRecord[],
): { skillId: string; updatedPKnown: number }[] {
  return skills.map((skill) => ({
    skillId: skill.skillId,
    updatedPKnown: applyLearning(
      bktUpdate(skill.pKnown, true, skill.params),
      skill.params.pLearn,
    ),
  }));
}

/**
 * For an INCORRECT multi-skill answer:
 * Distribute blame probabilistically based on which skill most likely failed.
 *
 * Simplified approximation:
 *   blame(X) ∝ (1 - pKnown(X)) / Σ(1 - pKnown(all))
 */
export function updateOnIncorrect(
  skills: SkillBktRecord[],
): BlameDistribution[] {
  const totalUnknown = skills.reduce((sum, s) => sum + (1 - s.pKnown), 0);

  if (totalUnknown < 0.001) {
    // All skills appear mastered - must be a slip, distribute evenly
    const evenWeight = 1 / skills.length;
    return skills.map((skill) => ({
      skillId: skill.skillId,
      blameWeight: evenWeight,
      updatedPKnown: bktUpdate(skill.pKnown, false, skill.params),
    }));
  }

  return skills.map((skill) => {
    const blameWeight = (1 - skill.pKnown) / totalUnknown;

    // Weighted update: soften negative evidence for skills unlikely to have caused error
    const fullNegativeUpdate = bktUpdate(skill.pKnown, false, skill.params);
    const weightedPKnown =
      skill.pKnown * (1 - blameWeight) + fullNegativeUpdate * blameWeight;

    return {
      skillId: skill.skillId,
      blameWeight,
      updatedPKnown: weightedPKnown,
    };
  });
}

2.3 Evidence Quality Modifiers

// src/lib/curriculum/bkt/evidence-quality.ts

/**
 * Adjust observation weight based on whether help was used.
 * Using help = less confident the student really knows it.
 *
 * Note: Help is binary (0 = no help, 1 = used help).
 * We can't determine which skill needed help for multi-skill problems,
 * so we apply the discount uniformly and let conjunctive BKT identify
 * weak skills from aggregated evidence.
 */
export function helpLevelWeight(helpLevel: 0 | 1): number {
  return helpLevel === 0 ? 1.0 : 0.5; // 50% weight for helped answers
}

/**
 * Adjust observation weight based on response time.
 *
 * - Fast correct → strong evidence of mastery
 * - Slow correct → might have struggled
 * - Fast incorrect → careless slip (less negative)
 * - Slow incorrect → genuine confusion (stronger negative)
 */
export function responseTimeWeight(
  responseTimeMs: number,
  isCorrect: boolean,
  expectedTimeMs: number = 5000,
): number {
  const ratio = responseTimeMs / expectedTimeMs;

  if (isCorrect) {
    if (ratio < 0.5) return 1.2; // Very fast - strong mastery
    if (ratio > 2.0) return 0.8; // Very slow - struggled
    return 1.0;
  } else {
    if (ratio < 0.3) return 0.5; // Very fast error - careless slip
    if (ratio > 2.0) return 1.2; // Very slow error - genuine confusion
    return 1.0;
  }
}

2.4 Domain-Informed Priors

// src/lib/curriculum/bkt/skill-priors.ts

export function getDefaultParams(skillId: string): BktParams {
  // Basic skills are easier to learn
  if (skillId.startsWith("basic.")) {
    return { pInit: 0.3, pLearn: 0.4, pSlip: 0.05, pGuess: 0.02 };
  }
  // Five complements are moderately difficult
  if (skillId.startsWith("fiveComplements")) {
    return { pInit: 0.1, pLearn: 0.3, pSlip: 0.1, pGuess: 0.02 };
  }
  // Ten complements are harder
  if (skillId.startsWith("tenComplements")) {
    return { pInit: 0.05, pLearn: 0.25, pSlip: 0.15, pGuess: 0.02 };
  }
  // Mixed complements are hardest
  if (skillId.startsWith("mixedComplements")) {
    return { pInit: 0.02, pLearn: 0.2, pSlip: 0.2, pGuess: 0.02 };
  }
  // Default
  return { pInit: 0.1, pLearn: 0.3, pSlip: 0.1, pGuess: 0.05 };
}

2.5 Confidence Calculation

// src/lib/curriculum/bkt/confidence.ts

/**
 * Calculate confidence in pKnown estimate.
 * Based on number of opportunities and consistency of observations.
 * Returns value in [0, 1] where 1 = highly confident.
 */
export function calculateConfidence(
  opportunities: number,
  successRate: number,
): number {
  // More data = more confidence (asymptotic to 1)
  const dataConfidence = 1 - Math.exp(-opportunities / 20);

  // Extreme success rates (very high or very low) = more confidence
  const extremity = Math.abs(successRate - 0.5) * 2; // 0 at 50%, 1 at 0% or 100%
  const consistencyBonus = extremity * 0.2;

  return Math.min(1, dataConfidence + consistencyBonus);
}

/**
 * Get confidence label for display.
 */
export function getConfidenceLabel(confidence: number): string {
  if (confidence > 0.7) return "confident";
  if (confidence > 0.4) return "moderate";
  return "uncertain";
}

/**
 * Calculate uncertainty range around pKnown estimate.
 * Wider range when confidence is low.
 */
export function getUncertaintyRange(
  pKnown: number,
  confidence: number,
): { low: number; high: number } {
  const uncertainty = (1 - confidence) * 0.3; // Max ±30% when confidence = 0
  return {
    low: Math.max(0, pKnown - uncertainty),
    high: Math.min(1, pKnown + uncertainty),
  };
}

3. Main BKT Computation Function

// src/lib/curriculum/bkt/compute-bkt.ts

import type { ProblemResultWithContext } from "../session-planner";
import { getDefaultParams, type BktParams } from "./skill-priors";
import { updateOnCorrect, updateOnIncorrect } from "./conjunctive-bkt";
import { helpLevelWeight, responseTimeWeight } from "./evidence-quality";
import { calculateConfidence, getUncertaintyRange } from "./confidence";

export interface BktComputeOptions {
  /** Confidence threshold for mastery classification */
  confidenceThreshold: number;
  /** Use cross-student priors (aggregated from other students) */
  useCrossStudentPriors: boolean;
}

export interface SkillBktResult {
  skillId: string;
  pKnown: number;
  confidence: number;
  uncertaintyRange: { low: number; high: number };
  opportunities: number;
  successCount: number;
  lastPracticedAt: Date | null;
  masteryClassification: "mastered" | "learning" | "struggling";
}

export interface BktComputeResult {
  skills: SkillBktResult[];
  interventionNeeded: SkillBktResult[];
  strengths: SkillBktResult[];
}

/**
 * Compute BKT state for all skills from problem history.
 * This is the main entry point - call it when displaying the Skills Dashboard.
 */
export function computeBktFromHistory(
  results: ProblemResultWithContext[],
  options: BktComputeOptions = {
    confidenceThreshold: 0.5,
    useCrossStudentPriors: false,
  },
): BktComputeResult {
  // Sort by timestamp to replay in order
  const sorted = [...results].sort((a, b) => a.timestamp - b.timestamp);

  // Track state for each skill
  const skillStates = new Map<
    string,
    {
      pKnown: number;
      opportunities: number;
      successCount: number;
      lastPracticedAt: Date | null;
      params: BktParams;
    }
  >();

  // Initialize and update for each problem
  for (const result of sorted) {
    const skillIds = result.problem.skillIds ?? [];
    if (skillIds.length === 0) continue;

    // Ensure all skills have state
    for (const skillId of skillIds) {
      if (!skillStates.has(skillId)) {
        const params = getDefaultParams(skillId);
        skillStates.set(skillId, {
          pKnown: params.pInit,
          opportunities: 0,
          successCount: 0,
          lastPracticedAt: null,
          params,
        });
      }
    }

    // Build skill records for BKT update
    const skillRecords = skillIds.map((skillId) => {
      const state = skillStates.get(skillId)!;
      return {
        skillId,
        pKnown: state.pKnown,
        params: state.params,
      };
    });

    // Calculate evidence weight
    const helpWeight = helpLevelWeight(result.helpLevel);
    const rtWeight = responseTimeWeight(
      result.responseTimeMs,
      result.isCorrect,
    );
    const evidenceWeight = helpWeight * rtWeight;

    // Compute updates
    const updates = result.isCorrect
      ? updateOnCorrect(skillRecords)
      : updateOnIncorrect(skillRecords);

    // Apply updates with evidence weighting
    for (const update of updates) {
      const state = skillStates.get(update.skillId)!;

      // Weighted blend between old and new pKnown based on evidence quality
      const newPKnown =
        state.pKnown * (1 - evidenceWeight) +
        update.updatedPKnown * evidenceWeight;

      state.pKnown = newPKnown;
      state.opportunities += 1;
      if (result.isCorrect) state.successCount += 1;
      state.lastPracticedAt = new Date(result.timestamp);
    }
  }

  // Convert to results
  const skills: SkillBktResult[] = [];

  for (const [skillId, state] of skillStates) {
    const successRate =
      state.opportunities > 0 ? state.successCount / state.opportunities : 0.5;
    const confidence = calculateConfidence(state.opportunities, successRate);
    const uncertaintyRange = getUncertaintyRange(state.pKnown, confidence);

    // Classify mastery
    let masteryClassification: "mastered" | "learning" | "struggling";
    if (state.pKnown >= 0.8 && confidence >= options.confidenceThreshold) {
      masteryClassification = "mastered";
    } else if (
      state.pKnown < 0.5 &&
      confidence >= options.confidenceThreshold
    ) {
      masteryClassification = "struggling";
    } else {
      masteryClassification = "learning";
    }

    skills.push({
      skillId,
      pKnown: state.pKnown,
      confidence,
      uncertaintyRange,
      opportunities: state.opportunities,
      successCount: state.successCount,
      lastPracticedAt: state.lastPracticedAt,
      masteryClassification,
    });
  }

  // Sort by pKnown ascending (struggling skills first)
  skills.sort((a, b) => a.pKnown - b.pKnown);

  // Identify intervention needed (low pKnown with high confidence)
  const interventionNeeded = skills.filter(
    (s) => s.masteryClassification === "struggling",
  );

  // Identify strengths (high pKnown with high confidence)
  const strengths = skills.filter(
    (s) => s.masteryClassification === "mastered",
  );

  return { skills, interventionNeeded, strengths };
}

4. UI Display Updates

4.1 Honest Language Guidelines

DON'T say:

  • "85% accuracy" (misleading - implies binary success tracking)
  • "Mastery: 85%" (implies certainty we don't have)
  • "You know this skill" (we can't know for sure)

DO say:

  • "~73% mastered (moderate confidence)"
  • "Estimated: 73% ± 15%"
  • "Appears mastered (based on 12 problems)"
  • "Needs attention (5 recent errors)"

4.2 Skill Card Display

interface SkillDisplayData {
  skillId: string;
  displayName: string;

  // BKT metrics
  pKnown: number; // 0-1, the main estimate
  confidence: number; // 0-1, how certain we are
  uncertaintyRange: { low: number; high: number };

  // Raw evidence
  opportunities: number; // Total problems
  successCount: number;
  errorCount: number; // opportunities - successCount

  // Staleness
  lastPracticedAt: Date | null;
  daysSinceLastPractice: number | null;
}

// Display:
// "~73% mastered (moderate confidence)"
// "Based on 15 problems (12 correct, 3 with errors)"
// "Last practiced 3 days ago"

4.3 Staleness Indicator

Show staleness separately from P(known) - don't apply decay to the estimate.

function getStalenessWarning(
  daysSinceLastPractice: number | null,
): string | null {
  if (daysSinceLastPractice === null) return null;
  if (daysSinceLastPractice < 7) return null;
  if (daysSinceLastPractice < 14) return "Not practiced recently";
  if (daysSinceLastPractice < 30) return "Getting rusty";
  return "Very stale - may need review";
}

4.4 UI Controls

Confidence Threshold Slider:

  • Default: 0.5
  • Range: 0.3 to 0.8
  • Affects mastery classification: higher threshold = stricter "mastered" label

Cross-Student Priors Toggle (future):

  • Default: off (use domain-informed priors only)
  • When on: adjust priors based on aggregate student data

5. Implementation Plan

Phase 1: Core BKT Functions (No DB Changes)

  1. Create src/lib/curriculum/bkt/ directory
  2. Implement pure functions: bkt-core.ts, conjunctive-bkt.ts, evidence-quality.ts, skill-priors.ts, confidence.ts
  3. Implement main entry point: compute-bkt.ts
  4. Write unit tests for BKT math

Phase 2: Skills Dashboard Update

  1. Update SkillsClient.tsx to call computeBktFromHistory()
  2. Replace naive accuracy display with P(known) + confidence
  3. Use honest language in all labels
  4. Add staleness indicators

Phase 3: UI Controls

  1. Add confidence threshold slider to Skills Dashboard
  2. Store preference in localStorage
  3. (Future) Add cross-student priors toggle

6. Open Questions (Deferred)

  1. Cross-student priors: How do we aggregate data across students to inform priors?

    • Answer: Deferred. Start with domain-informed priors only.
  2. Decay vs Staleness: Should we eventually add decay?

    • Answer: Show staleness indicator for now. Can add optional decay toggle later.
  3. Parameter estimation: Should P(T), P(S), P(G) be learned from data?

    • Answer: Start with domain-informed values. Can tune later with A/B testing.

7. BKT-Driven Problem Generation

Implemented in December 2024

7.1 Problem Generation Modes

Students can choose between two modes in the "Ready to Practice" modal:

Adaptive Mode (Default):

  • Uses BKT P(known) estimates for continuous complexity scaling
  • Formula: multiplier = 4 - (pKnown × 3)
  • Requires confidence ≥ 0.5 (~20 problems with skill)
  • Falls back to Classic mode if insufficient data

Classic Mode:

  • Uses fluency-based discrete multipliers
  • effortless (1×), fluent (2×), rusty (3×), practicing (3×), not_practicing (4×)
  • Fluency requires: ≥5 consecutive correct, ≥10 attempts, ≥85% accuracy

7.2 Implementation Files

File Purpose
config/bkt-integration.ts BKT config and multiplier calculation
utils/skillComplexity.ts Cost calculator with BKT support
session-planner.ts Session planning with BKT loading
StartPracticeModal.tsx Mode selection UI
SkillsClient.tsx Skills dashboard with multiplier display

7.3 User Preference Storage

-- player_curriculum table
problem_generation_mode TEXT DEFAULT 'adaptive' NOT NULL
-- Values: 'adaptive' | 'classic'

7.4 Skills Dashboard Consistency

The Skills Dashboard now shows:

  1. P(known) estimate - Same BKT estimate used for problem generation
  2. Complexity multiplier - Actual multiplier that will be used (e.g., "1.75×")
  3. Mode indicator - Whether BKT or fluency is being used for this skill

This ensures complete transparency about what drives problem generation.


References

  • Corbett, A. T., & Anderson, J. R. (1994). Knowledge tracing: Modeling the acquisition of procedural knowledge.
  • Pardos, Z. A., & Heffernan, N. T. (2011). KT-IDEM: Introducing item difficulty to the knowledge tracing model.