# Bayesian Knowledge Tracing (BKT) Design Specification

## Overview

This document specifies the implementation of Conjunctive Bayesian Knowledge Tracing for the soroban practice system. BKT provides epistemologically honest skill mastery estimates that account for:

1. **Asymmetric evidence**: Correct answers prove all skills; wrong answers only prove ≥1 skill failed
2. **Multi-skill problems**: Probabilistic blame distribution across co-occurring skills
3. **Uncertainty quantification**: Confidence intervals on mastery estimates
4. **Staleness indicators**: Show "last practiced X days ago" separately (not decay)

## Architecture Decision: Lazy Computation

**Key Decision**: BKT is computed on-demand when viewing reports, NOT in real-time during practice.

**Why:**

- No new database tables needed
- No hooks into practice session flow
- Can replay SlotResult history to compute BKT state
- Easy to change algorithm without migration
- Can add user controls (confidence slider, priors toggle) dynamically
- Estimated computation time: ~50ms for full report

**How it works:**

1. User opens Skills Dashboard
2. Dashboard fetches recent SlotResults (already stored in session_plans)
3. Pure functions replay history to compute BKT state for each skill
4. Display results with confidence indicators

---

## The Problem We're Solving

**Current approach (naive):**

```
accuracy = correct / attempts  // Treats both signals as equivalent
```

**Why it's wrong:**

- Correct: Strong evidence ALL skills are known
- Incorrect: Weak evidence that ONE OR MORE skills failed (we don't know which)

**BKT approach:**

- Maintain P(known) per skill with proper Bayesian updates
- Distribute "blame" for errors probabilistically based on prior beliefs
- Report uncertainty honestly

---

## 1. Data Source

### Existing Data (No Schema Changes Needed)

We already have all the data we need in `session_plans.results`:

```typescript
// From src/db/schema/session-plans.ts
export interface SlotResult {
  slotIndex: number;
  problemIndex: number;
  problem: GeneratedProblem; // Contains skillIds
  isCorrect: boolean;
  timestamp: number;
  responseTimeMs: number;
  userAnswer: number | null;
  helpLevel: 0 | 1; // Boolean: 0 = no help, 1 = used help
}
```

The `problem.skillIds` field tells us which skills were involved in each problem.

### Data Fetching

Already implemented: `getRecentSessionResults(playerId, sessionCount)` in `session-planner.ts`

---

## 2. BKT Algorithm (Pure Functions)

### 2.1 Core BKT Update Equations

```typescript
// src/lib/curriculum/bkt/bkt-core.ts

export interface BktParams {
  pInit: number; // P(L0) - prior knowledge
  pLearn: number; // P(T) - learning rate
  pSlip: number; // P(S) - slip rate
  pGuess: number; // P(G) - guess rate
}

export interface BktState {
  pKnown: number;
  opportunities: number;
  successCount: number;
  lastPracticedAt: Date | null;
}

/**
 * Standard BKT update for a SINGLE skill given an observation.
 *
 * For correct answer:
 *   P(known | correct) = P(correct | known) × P(known) / P(correct)
 *   where P(correct | known) = 1 - P(slip)
 *   and   P(correct | ¬known) = P(guess)
 *
 * For incorrect answer:
 *   P(known | incorrect) = P(incorrect | known) × P(known) / P(incorrect)
 *   where P(incorrect | known) = P(slip)
 *   and   P(incorrect | ¬known) = 1 - P(guess)
 */
export function bktUpdate(
  priorPKnown: number,
  isCorrect: boolean,
  params: BktParams,
): number {
  const { pSlip, pGuess } = params;

  if (isCorrect) {
    const pCorrect = priorPKnown * (1 - pSlip) + (1 - priorPKnown) * pGuess;
    const pKnownGivenCorrect = (priorPKnown * (1 - pSlip)) / pCorrect;
    return pKnownGivenCorrect;
  } else {
    const pIncorrect = priorPKnown * pSlip + (1 - priorPKnown) * (1 - pGuess);
    const pKnownGivenIncorrect = (priorPKnown * pSlip) / pIncorrect;
    return pKnownGivenIncorrect;
  }
}

/**
 * Apply learning transition after observation.
 * P(known after learning) = P(known) + P(¬known) × P(learn)
 */
export function applyLearning(pKnown: number, pLearn: number): number {
  return pKnown + (1 - pKnown) * pLearn;
}
```

### 2.2 Conjunctive BKT for Multi-Skill Problems

```typescript
// src/lib/curriculum/bkt/conjunctive-bkt.ts

export interface SkillBktRecord {
  skillId: string;
  pKnown: number;
  params: BktParams;
}

export interface BlameDistribution {
  skillId: string;
  blameWeight: number; // Higher = more likely this skill caused the error
  updatedPKnown: number;
}

/**
 * For a CORRECT multi-skill answer:
 * All skills receive positive evidence (student knew all of them).
 * Update each skill independently with the correct observation.
 */
export function updateOnCorrect(
  skills: SkillBktRecord[],
): { skillId: string; updatedPKnown: number }[] {
  return skills.map((skill) => ({
    skillId: skill.skillId,
    updatedPKnown: applyLearning(
      bktUpdate(skill.pKnown, true, skill.params),
      skill.params.pLearn,
    ),
  }));
}

/**
 * For an INCORRECT multi-skill answer:
 * Distribute blame probabilistically based on which skill most likely failed.
 *
 * Simplified approximation:
 *   blame(X) ∝ (1 - pKnown(X)) / Σ(1 - pKnown(all))
 */
export function updateOnIncorrect(
  skills: SkillBktRecord[],
): BlameDistribution[] {
  const totalUnknown = skills.reduce((sum, s) => sum + (1 - s.pKnown), 0);

  if (totalUnknown < 0.001) {
    // All skills appear mastered - must be a slip, distribute evenly
    const evenWeight = 1 / skills.length;
    return skills.map((skill) => ({
      skillId: skill.skillId,
      blameWeight: evenWeight,
      updatedPKnown: bktUpdate(skill.pKnown, false, skill.params),
    }));
  }

  return skills.map((skill) => {
    const blameWeight = (1 - skill.pKnown) / totalUnknown;

    // Weighted update: soften negative evidence for skills unlikely to have caused error
    const fullNegativeUpdate = bktUpdate(skill.pKnown, false, skill.params);
    const weightedPKnown =
      skill.pKnown * (1 - blameWeight) + fullNegativeUpdate * blameWeight;

    return {
      skillId: skill.skillId,
      blameWeight,
      updatedPKnown: weightedPKnown,
    };
  });
}
```

### 2.3 Evidence Quality Modifiers

```typescript
// src/lib/curriculum/bkt/evidence-quality.ts

/**
 * Adjust observation weight based on whether help was used.
 * Using help = less confident the student really knows it.
 *
 * Note: Help is binary (0 = no help, 1 = used help).
 * We can't determine which skill needed help for multi-skill problems,
 * so we apply the discount uniformly and let conjunctive BKT identify
 * weak skills from aggregated evidence.
 */
export function helpLevelWeight(helpLevel: 0 | 1): number {
  return helpLevel === 0 ? 1.0 : 0.5; // 50% weight for helped answers
}

/**
 * Adjust observation weight based on response time.
 *
 * - Fast correct → strong evidence of mastery
 * - Slow correct → might have struggled
 * - Fast incorrect → careless slip (less negative)
 * - Slow incorrect → genuine confusion (stronger negative)
 */
export function responseTimeWeight(
  responseTimeMs: number,
  isCorrect: boolean,
  expectedTimeMs: number = 5000,
): number {
  const ratio = responseTimeMs / expectedTimeMs;

  if (isCorrect) {
    if (ratio < 0.5) return 1.2; // Very fast - strong mastery
    if (ratio > 2.0) return 0.8; // Very slow - struggled
    return 1.0;
  } else {
    if (ratio < 0.3) return 0.5; // Very fast error - careless slip
    if (ratio > 2.0) return 1.2; // Very slow error - genuine confusion
    return 1.0;
  }
}
```

### 2.4 Domain-Informed Priors

```typescript
// src/lib/curriculum/bkt/skill-priors.ts

export function getDefaultParams(skillId: string): BktParams {
  // Basic skills are easier to learn
  if (skillId.startsWith("basic.")) {
    return { pInit: 0.3, pLearn: 0.4, pSlip: 0.05, pGuess: 0.02 };
  }
  // Five complements are moderately difficult
  if (skillId.startsWith("fiveComplements")) {
    return { pInit: 0.1, pLearn: 0.3, pSlip: 0.1, pGuess: 0.02 };
  }
  // Ten complements are harder
  if (skillId.startsWith("tenComplements")) {
    return { pInit: 0.05, pLearn: 0.25, pSlip: 0.15, pGuess: 0.02 };
  }
  // Mixed complements are hardest
  if (skillId.startsWith("mixedComplements")) {
    return { pInit: 0.02, pLearn: 0.2, pSlip: 0.2, pGuess: 0.02 };
  }
  // Default
  return { pInit: 0.1, pLearn: 0.3, pSlip: 0.1, pGuess: 0.05 };
}
```

### 2.5 Confidence Calculation

```typescript
// src/lib/curriculum/bkt/confidence.ts

/**
 * Calculate confidence in pKnown estimate.
 * Based on number of opportunities and consistency of observations.
 * Returns value in [0, 1] where 1 = highly confident.
 */
export function calculateConfidence(
  opportunities: number,
  successRate: number,
): number {
  // More data = more confidence (asymptotic to 1)
  const dataConfidence = 1 - Math.exp(-opportunities / 20);

  // Extreme success rates (very high or very low) = more confidence
  const extremity = Math.abs(successRate - 0.5) * 2; // 0 at 50%, 1 at 0% or 100%
  const consistencyBonus = extremity * 0.2;

  return Math.min(1, dataConfidence + consistencyBonus);
}

/**
 * Get confidence label for display.
 */
export function getConfidenceLabel(confidence: number): string {
  if (confidence > 0.7) return "confident";
  if (confidence > 0.4) return "moderate";
  return "uncertain";
}

/**
 * Calculate uncertainty range around pKnown estimate.
 * Wider range when confidence is low.
 */
export function getUncertaintyRange(
  pKnown: number,
  confidence: number,
): { low: number; high: number } {
  const uncertainty = (1 - confidence) * 0.3; // Max ±30% when confidence = 0
  return {
    low: Math.max(0, pKnown - uncertainty),
    high: Math.min(1, pKnown + uncertainty),
  };
}
```

---

## 3. Main BKT Computation Function

```typescript
// src/lib/curriculum/bkt/compute-bkt.ts

import type { ProblemResultWithContext } from "../session-planner";
import { getDefaultParams, type BktParams } from "./skill-priors";
import { updateOnCorrect, updateOnIncorrect } from "./conjunctive-bkt";
import { helpLevelWeight, responseTimeWeight } from "./evidence-quality";
import { calculateConfidence, getUncertaintyRange } from "./confidence";

export interface BktComputeOptions {
  /** Confidence threshold for mastery classification */
  confidenceThreshold: number;
  /** Use cross-student priors (aggregated from other students) */
  useCrossStudentPriors: boolean;
}

export interface SkillBktResult {
  skillId: string;
  pKnown: number;
  confidence: number;
  uncertaintyRange: { low: number; high: number };
  opportunities: number;
  successCount: number;
  lastPracticedAt: Date | null;
  masteryClassification: "mastered" | "learning" | "struggling";
}

export interface BktComputeResult {
  skills: SkillBktResult[];
  interventionNeeded: SkillBktResult[];
  strengths: SkillBktResult[];
}

/**
 * Compute BKT state for all skills from problem history.
 * This is the main entry point - call it when displaying the Skills Dashboard.
 */
export function computeBktFromHistory(
  results: ProblemResultWithContext[],
  options: BktComputeOptions = {
    confidenceThreshold: 0.5,
    useCrossStudentPriors: false,
  },
): BktComputeResult {
  // Sort by timestamp to replay in order
  const sorted = [...results].sort((a, b) => a.timestamp - b.timestamp);

  // Track state for each skill
  const skillStates = new Map<
    string,
    {
      pKnown: number;
      opportunities: number;
      successCount: number;
      lastPracticedAt: Date | null;
      params: BktParams;
    }
  >();

  // Initialize and update for each problem
  for (const result of sorted) {
    const skillIds = result.problem.skillIds ?? [];
    if (skillIds.length === 0) continue;

    // Ensure all skills have state
    for (const skillId of skillIds) {
      if (!skillStates.has(skillId)) {
        const params = getDefaultParams(skillId);
        skillStates.set(skillId, {
          pKnown: params.pInit,
          opportunities: 0,
          successCount: 0,
          lastPracticedAt: null,
          params,
        });
      }
    }

    // Build skill records for BKT update
    const skillRecords = skillIds.map((skillId) => {
      const state = skillStates.get(skillId)!;
      return {
        skillId,
        pKnown: state.pKnown,
        params: state.params,
      };
    });

    // Calculate evidence weight
    const helpWeight = helpLevelWeight(result.helpLevel);
    const rtWeight = responseTimeWeight(
      result.responseTimeMs,
      result.isCorrect,
    );
    const evidenceWeight = helpWeight * rtWeight;

    // Compute updates
    const updates = result.isCorrect
      ? updateOnCorrect(skillRecords)
      : updateOnIncorrect(skillRecords);

    // Apply updates with evidence weighting
    for (const update of updates) {
      const state = skillStates.get(update.skillId)!;

      // Weighted blend between old and new pKnown based on evidence quality
      const newPKnown =
        state.pKnown * (1 - evidenceWeight) +
        update.updatedPKnown * evidenceWeight;

      state.pKnown = newPKnown;
      state.opportunities += 1;
      if (result.isCorrect) state.successCount += 1;
      state.lastPracticedAt = new Date(result.timestamp);
    }
  }

  // Convert to results
  const skills: SkillBktResult[] = [];

  for (const [skillId, state] of skillStates) {
    const successRate =
      state.opportunities > 0 ? state.successCount / state.opportunities : 0.5;
    const confidence = calculateConfidence(state.opportunities, successRate);
    const uncertaintyRange = getUncertaintyRange(state.pKnown, confidence);

    // Classify mastery
    let masteryClassification: "mastered" | "learning" | "struggling";
    if (state.pKnown >= 0.8 && confidence >= options.confidenceThreshold) {
      masteryClassification = "mastered";
    } else if (
      state.pKnown < 0.5 &&
      confidence >= options.confidenceThreshold
    ) {
      masteryClassification = "struggling";
    } else {
      masteryClassification = "learning";
    }

    skills.push({
      skillId,
      pKnown: state.pKnown,
      confidence,
      uncertaintyRange,
      opportunities: state.opportunities,
      successCount: state.successCount,
      lastPracticedAt: state.lastPracticedAt,
      masteryClassification,
    });
  }

  // Sort by pKnown ascending (struggling skills first)
  skills.sort((a, b) => a.pKnown - b.pKnown);

  // Identify intervention needed (low pKnown with high confidence)
  const interventionNeeded = skills.filter(
    (s) => s.masteryClassification === "struggling",
  );

  // Identify strengths (high pKnown with high confidence)
  const strengths = skills.filter(
    (s) => s.masteryClassification === "mastered",
  );

  return { skills, interventionNeeded, strengths };
}
```

---

## 4. UI Display Updates

### 4.1 Honest Language Guidelines

**DON'T say:**

- "85% accuracy" (misleading - implies binary success tracking)
- "Mastery: 85%" (implies certainty we don't have)
- "You know this skill" (we can't know for sure)

**DO say:**

- "~73% mastered (moderate confidence)"
- "Estimated: 73% ± 15%"
- "Appears mastered (based on 12 problems)"
- "Needs attention (5 recent errors)"

### 4.2 Skill Card Display

```typescript
interface SkillDisplayData {
  skillId: string;
  displayName: string;

  // BKT metrics
  pKnown: number; // 0-1, the main estimate
  confidence: number; // 0-1, how certain we are
  uncertaintyRange: { low: number; high: number };

  // Raw evidence
  opportunities: number; // Total problems
  successCount: number;
  errorCount: number; // opportunities - successCount

  // Staleness
  lastPracticedAt: Date | null;
  daysSinceLastPractice: number | null;
}

// Display:
// "~73% mastered (moderate confidence)"
// "Based on 15 problems (12 correct, 3 with errors)"
// "Last practiced 3 days ago"
```

### 4.3 Staleness Indicator

Show staleness separately from P(known) - don't apply decay to the estimate.

```typescript
function getStalenessWarning(
  daysSinceLastPractice: number | null,
): string | null {
  if (daysSinceLastPractice === null) return null;
  if (daysSinceLastPractice < 7) return null;
  if (daysSinceLastPractice < 14) return "Not practiced recently";
  if (daysSinceLastPractice < 30) return "Getting rusty";
  return "Very stale - may need review";
}
```

### 4.4 UI Controls

**Confidence Threshold Slider:**

- Default: 0.5
- Range: 0.3 to 0.8
- Affects mastery classification: higher threshold = stricter "mastered" label

**Cross-Student Priors Toggle (future):**

- Default: off (use domain-informed priors only)
- When on: adjust priors based on aggregate student data

---

## 5. Implementation Plan

### Phase 1: Core BKT Functions (No DB Changes)

1. Create `src/lib/curriculum/bkt/` directory
2. Implement pure functions: bkt-core.ts, conjunctive-bkt.ts, evidence-quality.ts, skill-priors.ts, confidence.ts
3. Implement main entry point: compute-bkt.ts
4. Write unit tests for BKT math

### Phase 2: Skills Dashboard Update

1. Update `SkillsClient.tsx` to call `computeBktFromHistory()`
2. Replace naive accuracy display with P(known) + confidence
3. Use honest language in all labels
4. Add staleness indicators

### Phase 3: UI Controls

1. Add confidence threshold slider to Skills Dashboard
2. Store preference in localStorage
3. (Future) Add cross-student priors toggle

---

## 6. Open Questions (Deferred)

1. **Cross-student priors**: How do we aggregate data across students to inform priors?
   - Answer: Deferred. Start with domain-informed priors only.

2. **Decay vs Staleness**: Should we eventually add decay?
   - Answer: Show staleness indicator for now. Can add optional decay toggle later.

3. **Parameter estimation**: Should P(T), P(S), P(G) be learned from data?
   - Answer: Start with domain-informed values. Can tune later with A/B testing.

---

## 7. BKT-Driven Problem Generation

**Implemented in December 2024**

### 7.1 Problem Generation Modes

Students can choose between two modes in the "Ready to Practice" modal:

**Adaptive Mode (Default):**

- Uses BKT P(known) estimates for continuous complexity scaling
- Formula: `multiplier = 4 - (pKnown × 3)`
- Requires confidence ≥ 0.5 (~20 problems with skill)
- Falls back to Classic mode if insufficient data

**Classic Mode:**

- Uses fluency-based discrete multipliers
- `effortless (1×), fluent (2×), rusty (3×), practicing (3×), not_practicing (4×)`
- Fluency requires: ≥5 consecutive correct, ≥10 attempts, ≥85% accuracy

### 7.2 Implementation Files

| File                        | Purpose                                  |
| --------------------------- | ---------------------------------------- |
| `config/bkt-integration.ts` | BKT config and multiplier calculation    |
| `utils/skillComplexity.ts`  | Cost calculator with BKT support         |
| `session-planner.ts`        | Session planning with BKT loading        |
| `StartPracticeModal.tsx`    | Mode selection UI                        |
| `SkillsClient.tsx`          | Skills dashboard with multiplier display |

### 7.3 User Preference Storage

```sql
-- player_curriculum table
problem_generation_mode TEXT DEFAULT 'adaptive' NOT NULL
-- Values: 'adaptive' | 'classic'
```

### 7.4 Skills Dashboard Consistency

The Skills Dashboard now shows:

1. **P(known) estimate** - Same BKT estimate used for problem generation
2. **Complexity multiplier** - Actual multiplier that will be used (e.g., "1.75×")
3. **Mode indicator** - Whether BKT or fluency is being used for this skill

This ensures complete transparency about what drives problem generation.

---

## References

- Corbett, A. T., & Anderson, J. R. (1994). Knowledge tracing: Modeling the acquisition of procedural knowledge.
- Pardos, Z. A., & Heffernan, N. T. (2011). KT-IDEM: Introducing item difficulty to the knowledge tracing model.