feat(bkt): add adaptive-bkt mode with unified BKT architecture

- Add 'adaptive-bkt' mode using BKT for both skill targeting AND cost calculation (previously BKT was only used for targeting) - Make adaptive-bkt the default problem generation mode - Fix session-planner to include adaptive-bkt in BKT targeting logic - Add fatigue tracking to journey simulator (sum of skill multipliers) - Add 3-way comparison test (classic vs adaptive vs adaptive-bkt) Validation results show both adaptive modes perform identically for learning rate (25-33% faster than classic). The benefit comes from BKT targeting, not the cost formula - using BKT for both simplifies the architecture with no performance cost. UI changes: - Simplify Problem Selection to two user-friendly options: "Focus on weak spots" (recommended) and "Practice everything" - Remove jargon like "BKT" and "fluency" from user-facing labels Blog post updated with 3-way comparison findings and unified BKT architecture documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 19:24:14 -06:00 · 2025-12-15 19:24:14 -06:00 · 7085a4b3df
parent 354ada596d
commit 7085a4b3df
10 changed files with 489 additions and 67 deletions
--- a/apps/web/.claude/KEHKASHAN_CONSULTATION.md
+++ b/apps/web/.claude/KEHKASHAN_CONSULTATION.md
@ -0,0 +1,117 @@
+# Consultation with Kehkashan Khan - Student Learning Model
+
+## Context
+
+We are improving the SimulatedStudent model used in journey simulation tests to validate BKT-based adaptive problem generation. The current model uses a Hill function for learning but lacks several realistic phenomena.
+
+## Current Model Limitations
+
+| Phenomenon | Reality | Current Model |
+|------------|---------|---------------|
+| **Forgetting** | Skills decay without practice | Skills never decay |
+| **Transfer** | Learning one complement helps learn others | Skills are independent |
+| **Skill difficulty** | Some skills are inherently harder | All skills have same K |
+| **Within-session fatigue** | Later problems are harder | All problems equal |
+| **Warm-up effect** | First few problems are shakier | No warm-up |
+
+## Email Sent to Kehkashan
+
+**Date:** 2025-12-15
+**From:** Thomas Hallock <hallock@gmail.com>
+**To:** Kehkashan Khan
+**Subject:** (not captured)
+
+---
+
+Hi Ms. Hkan,
+
+I hope you and your mother are doing well in Oman. Please don't feel the need to reply to this immediately—whenever you have a spare moment is fine.
+
+I've been updating some abacus practice software and I've been testing on Sonia and Fern, but I only have a sample size of 2, so I have had to make some assumptions that I'd like to improve upon. Specifically I've been trying to make it "smarter" about which problems to generate for them. The goal is for the app to automatically detect when they are struggling with a specific movement (like a 5-complement) and give them just enough practice to fix it without getting boring.
+
+I have a computer simulation running to test this, and have seen some very positive results in learning compared to the method from my books, but I realized my assumptions about how children actually learn might be a bit too simple. Since you have observes this process with many different children, I'd love your take on a few things:
+
+Are some skills inherently harder? In your experience, are certain movements just naturally harder for kids to grasp than others? For example, is a "10-complement" (like +9 = -1 +10) usually harder to master than a "5-complement" (like +4 = +5 -1)? Or are they about the same difficulty once the concept clicks?
+
+Do skills transfer? Once a student truly understands the movement for +4, does that make learning +3 easier? Or do they tend to treat every new number as a completely new skill that needs to be practiced from scratch?
+
+How fast does "rust" set in? If a student masters a specific skill but doesn't use it for two weeks, do they usually retain it? Or do they tend to forget it and need to re-learn it?
+
+Fatigue vs. Warm-up Do you notice that accuracy drops significantly after 15-20 minutes? Or is there the opposite effect, where they need a "warm-up" period at the start of a lesson before they hit their stride?
+
+Any "gut feeling" or observations you have would be incredibly helpful. I can use that info to make the math behind the app much more realistic.
+
+Hope you are managing everything over there. See you Sunday!
+
+p.s If you're curious, I have written up a draft about the system on my blog here:
+https://abaci.one/blog/conjunctive-bkt-skill-tracing
+
+Best,
+Thomas
+
+---
+
+## Questions Asked & How to Use Answers
+
+### 1. Skill Difficulty
+**Question:** Are 10-complements harder than 5-complements?
+**How to model:** Add per-skill K values (half-max exposure) in SimulatedStudent
+```typescript
+const SKILL_DIFFICULTY: Record<string, number> = {
+  'basic.directAddition': 5,
+  'fiveComplements.*': 10,      // If she says 5-comp is medium
+  'tenComplements.*': 18,       // If she says 10-comp is harder
+}
+```
+
+### 2. Transfer Effects
+**Question:** Does learning +4 help with +3?
+**How to model:** Add transfer weights between related skills
+```typescript
+// If she says yes, skills transfer within categories:
+function getEffectiveExposure(skillId: string): number {
+  const direct = exposures.get(skillId) ?? 0
+  const transferred = getRelatedSkills(skillId)
+    .reduce((sum, related) => sum + (exposures.get(related) ?? 0) * TRANSFER_WEIGHT, 0)
+  return direct + transferred
+}
+```
+
+### 3. Forgetting/Rust
+**Question:** How fast do skills decay without practice?
+**How to model:** Multiply probability by retention factor
+```typescript
+// If she says 2 weeks causes noticeable rust:
+const HALF_LIFE_DAYS = 14  // Tune based on her answer
+retention = Math.exp(-daysSinceLastPractice / HALF_LIFE_DAYS)
+P_effective = P_base * retention
+```
+
+### 4. Fatigue & Warm-up
+**Question:** Does accuracy drop after 15-20 min? Is there warm-up?
+**How to model:** Add session position effects
+```typescript
+// If she says both exist:
+function sessionPositionMultiplier(problemIndex: number, totalProblems: number): number {
+  const warmupBoost = Math.min(1, problemIndex / 3)  // First 3 problems are warm-up
+  const fatiguePenalty = problemIndex / totalProblems * 0.1  // 10% drop by end
+  return warmupBoost * (1 - fatiguePenalty)
+}
+```
+
+## Background on Kehkashan
+
+- Abacus coach for Sonia and Fern (Thomas's kids)
+- Teaches 1 hour each Sunday
+- Getting PhD in something related to academic rigor in children
+- Expert in soroban pedagogy
+- Currently in Oman caring for her mother
+- Not deeply technical/statistical, so answers will be qualitative observations
+
+## When Reply Arrives
+
+1. Extract her observations for each question
+2. Translate qualitative answers to model parameters
+3. Implement changes to SimulatedStudent.ts
+4. Re-run 3-way comparison to see if results change
+5. Update blog post if findings are significant
--- a/apps/web/content/blog/conjunctive-bkt-skill-tracing.md
+++ b/apps/web/content/blog/conjunctive-bkt-skill-tracing.md
@ -3,14 +3,14 @@ title: "Binary Outcomes, Granular Insights: How We Know Which Abacus Skill Needs
 description: "How we use conjunctive Bayesian Knowledge Tracing to infer which visual-motor patterns a student has automated when all we observe is 'problem correct' or 'problem incorrect'."
 author: "Abaci.one Team"
 publishedAt: "2025-12-14"
-updatedAt: "2025-12-15"
+updatedAt: "2025-12-16"
 tags: ["education", "machine-learning", "bayesian", "soroban", "knowledge-tracing", "adaptive-learning"]
 featured: true
 ---

 # Binary Outcomes, Granular Insights: How We Know Which Abacus Skill Needs Work

-> **Abstract:** Soroban (Japanese abacus) pedagogy treats arithmetic as a sequence of visual-motor patterns to be drilled to automaticity. Each numeral operation (adding 1, adding 2, ...) in each column context is a distinct pattern; curricula explicitly sequence these patterns, requiring mastery of each before introducing the next. This creates a well-defined skill hierarchy of ~30 discrete patterns. We apply conjunctive Bayesian Knowledge Tracing to infer pattern mastery from binary problem outcomes. At problem-generation time, we simulate the abacus to tag each term with the specific patterns it exercises. Correct answers provide evidence for all tagged patterns; incorrect answers distribute blame proportionally to each pattern's estimated weakness. The discrete, sequenced nature of soroban skills makes this inference tractable—each pattern is independently trainable and assessable. We describe the skill hierarchy, simulation-based tagging, evidence weighting for help usage and response time, and complexity-aware problem generation that respects the student's current mastery profile. Simulation studies validate that this adaptive targeting reaches mastery thresholds significantly faster than uniform skill distribution—in controlled tests, adaptive mode reached 80% mastery faster in 9 out of 9 comparable scenarios.
+> **Abstract:** Soroban (Japanese abacus) pedagogy treats arithmetic as a sequence of visual-motor patterns to be drilled to automaticity. Each numeral operation (adding 1, adding 2, ...) in each column context is a distinct pattern; curricula explicitly sequence these patterns, requiring mastery of each before introducing the next. This creates a well-defined skill hierarchy of ~30 discrete patterns. We apply conjunctive Bayesian Knowledge Tracing to infer pattern mastery from binary problem outcomes. At problem-generation time, we simulate the abacus to tag each term with the specific patterns it exercises. Correct answers provide evidence for all tagged patterns; incorrect answers distribute blame proportionally to each pattern's estimated weakness. BKT drives both skill targeting (prioritizing weak skills for practice) and difficulty adjustment (scaling problem complexity to mastery level). Simulation studies validate that adaptive targeting reaches mastery 25-33% faster than uniform skill distribution. Our 3-way comparison found that the benefit comes from BKT *targeting*, not the specific cost formula—using BKT for both concerns simplifies the architecture with no performance cost.

 ---

@ -237,16 +237,16 @@ const newPKnown = oldPKnown * (1 - evidenceWeight) + bktUpdate * evidenceWeight

 ## Automaticity-Aware Problem Generation

-Problem generation involves two independent concerns:
+Problem generation involves two concerns:

-1. **Cost calculation** (fluency-based): Controls problem difficulty by budgeting cognitive load
-2. **Skill targeting** (BKT-based): Identifies which skills need practice and prioritizes them
+1. **Skill targeting** (BKT-based): Identifies which skills need practice and prioritizes them
+2. **Cost calculation**: Controls problem difficulty by budgeting cognitive load

-This section describes cost calculation. The next section covers skill targeting.
+Both concerns now use BKT. We experimented with separating them—using BKT only for targeting while using fluency (recent streak consistency) for cost calculation—but found that using BKT for both produces equivalent results while simplifying the architecture.

 ### Complexity Budgeting

-We budget problem complexity based on the student's **fluency state** for each pattern. This is separate from BKT—fluency tracks recent performance consistency, while BKT estimates overall mastery.
+We budget problem complexity based on the student's estimated mastery from BKT. When BKT confidence is low (< 30%), we fall back to fluency-based estimates.

 ### Complexity Costing

@ -258,15 +258,18 @@ Each pattern has a **base complexity cost**:

 ### Automaticity Multipliers

-The cost is scaled by the student's automaticity of each pattern:
+The cost is scaled by the student's estimated mastery from BKT. The multiplier uses a non-linear (squared) mapping from P(known) to provide better differentiation at high mastery levels:

-| Fluency State | Multiplier | Meaning |
-|---------------|------------|---------|
-| `effortless` | 1× | Recently demonstrated automaticity |
-| `fluent` | 2× | Solid but needs warmup |
-| `rusty` | 3× | Was fluent, needs rebuilding |
-| `practicing` | 3× | Still learning |
-| `not_practicing` | 4× | Not in active rotation |
+| P(known) | Multiplier | Meaning |
+|----------|------------|---------|
+| 1.00 | 1.0× | Fully automated |
+| 0.95 | 1.3× | Nearly automated |
+| 0.90 | 1.6× | Solid |
+| 0.80 | 2.1× | Good but not automatic |
+| 0.50 | 3.3× | Halfway there |
+| 0.00 | 4.0× | Just starting |
+
+When BKT confidence is insufficient (< 30%), we fall back to discrete fluency states based on recent streaks.

 ### Adaptive Session Planning

@ -285,16 +288,16 @@ This creates natural adaptation:
 // Same problem, different complexity for different students:
 const problem = [7, 6]  // 7 + 6 = 13, requires tenComplements.6

-// Student A (ten-complements automated)
-complexity_A = 2 × 1 = 2  // Easy for this student
+// Student A: BKT P(known) = 0.95 for ten-complements
+complexity_A = 2 × 1.3 = 2.6  // Easy for this student

-// Student B (still practicing ten-complements)
-complexity_B = 2 × 3 = 6  // Challenging for this student
+// Student B: BKT P(known) = 0.50 for ten-complements
+complexity_B = 2 × 3.3 = 6.6  // Challenging for this student
 ```

 ## Adaptive Skill Targeting

-While fluency multipliers control *how difficult* problems should be, BKT serves a different purpose: identifying *which skills need practice*.
+Beyond controlling difficulty, BKT identifies *which skills need practice*.

 ### Identifying Weak Skills

@ -330,15 +333,13 @@ for (const slot of focusSlots) {
 }
 ```

-### Why Separate Cost from Targeting?
+### The Budget Trap (and How We Avoided It)

-Early in development, we experimented with using BKT P(known) directly as a cost multiplier. This was backwards: skills with low P(known) got high multipliers, making them expensive, so the budget filter excluded them. Students never practiced what they needed most.
+When we first tried using BKT P(known) as a cost multiplier, we hit a problem: skills with low P(known) got high multipliers, making them expensive. If we only used cost filtering, the budget would exclude weak skills—students would never practice what they needed most.

-The correct architecture separates concerns:
- **Cost calculation**: Uses fluency state to prevent cognitive overload
- **Skill targeting**: Uses BKT to prioritize practice on weak skills
+The solution was **skill targeting**: BKT identifies weak skills and adds them to the problem generator's required targets. This ensures weak skills appear in problems *regardless* of their cost. The complexity budget still applies, but it filters problem *structure* (number of terms, digit ranges), not which skills can appear.

-A student struggling with ten-complements gets problems that *include* ten-complements (targeting), but those problems still respect their complexity budget (costing). This ensures practice without overwhelming.
+A student struggling with ten-complements gets problems that *include* ten-complements (targeting), while the problem complexity stays within their budget (fewer terms, simpler starting values).

 ## Honest Uncertainty Reporting

@ -417,7 +418,10 @@ The confidence threshold is user-adjustable (default 50%), allowing teachers to

 ## Validation: Does Adaptive Targeting Actually Work?

-We built a journey simulator to compare adaptive (BKT-driven) vs. classic (uniform) problem generation across controlled scenarios.
+We built a journey simulator to compare three modes across controlled scenarios:
+- **Classic**: Uniform skill distribution, fluency-based difficulty
+- **Adaptive (fluency)**: BKT skill targeting, fluency-based difficulty
+- **Adaptive (full BKT)**: BKT skill targeting, BKT-based difficulty

 ### Simulation Framework

@ -426,7 +430,7 @@ The simulator models student learning using:
 - **Hill function learning model**: `P(correct) = exposure^n / (K^n + exposure^n)`, where exposure is the number of times the student has practiced a skill
 - **Conjunctive model**: Multi-skill problems require all skills to succeed—P(correct) is the product of individual skill probabilities
 - **Per-skill deficiency profiles**: Each test case starts one skill at zero exposure, with all prerequisites mastered
- **Test matrix**: 32 skills × 3 learner types (fast, average, slow) = 96 scenarios
+- **Cognitive fatigue tracking**: Sum of difficulty multipliers for each skill in each problem—measures the mental effort required per session

 The Hill function creates realistic learning curves: early practice yields slow improvement (building foundation), then understanding "clicks" (rapid gains), then asymptotic approach to mastery.

@ -471,6 +475,21 @@ The key question: How fast does each mode bring a weak skill to mastery?

 "Never" entries indicate the mode didn't reach that threshold within 12 sessions.

+### 3-Way Comparison: BKT vs Fluency Multipliers
+
+We also compared whether using BKT for cost calculation (in addition to targeting) provides additional benefit over fluency-based cost calculation:
+
+| Skill | Mode | →50% | →80% | Fatigue/Session |
+|-------|------|------|------|-----------------|
+| fiveComplements.3=5-2 | Classic | 5 | 9 | 120.3 |
+| fiveComplements.3=5-2 | Adaptive (fluency) | 3 | 6 | 122.8 |
+| fiveComplements.3=5-2 | Adaptive (full BKT) | 3 | 6 | 122.8 |
+| fiveComplementsSub.-3 | Classic | 4 | 8 | 131.9 |
+| fiveComplementsSub.-3 | Adaptive (fluency) | 3 | 6 | 133.6 |
+| fiveComplementsSub.-3 | Adaptive (full BKT) | 3 | 6 | 133.0 |
+
+**Finding**: Both adaptive modes perform identically for learning rate—the benefit comes from BKT *targeting*, not from BKT-based cost calculation. However, using BKT for costs simplifies the architecture (one model instead of two) with no measurable downside.
+
 ### Example Trajectory

 For a fast learner deficient in `fiveComplements.3=5-2`:
@ -515,11 +534,13 @@ Our approach combines:
 1. **Simulation-based pattern tagging** at problem-generation time
 2. **Conjunctive BKT** with probabilistic blame distribution
 3. **Evidence quality weighting** based on help level and response time
-4. **Separate concerns**: Fluency-based complexity budgeting (controls difficulty) and BKT-based skill targeting (prioritizes practice)
+4. **Unified BKT architecture**: BKT drives both difficulty adjustment and skill targeting
 5. **Honest uncertainty reporting** with confidence intervals
-6. **Validated adaptive targeting** that reaches mastery thresholds significantly faster than uniform practice
+6. **Validated adaptive targeting** that reaches mastery 25-33% faster than uniform practice

-The result is a system that adapts to each student's actual pattern automaticity, not just their overall accuracy—targeting weak skills for accelerated mastery while honestly communicating what it knows and doesn't know.
+The key insight from our validation: the benefit of adaptive practice comes from *targeting weak skills*, not from the specific formula used for difficulty adjustment. BKT targeting ensures students practice what they need; the complexity budget ensures they're not overwhelmed.
+
+The result is a system that adapts to each student's actual pattern automaticity, not just their overall accuracy—focusing practice where it matters most while honestly communicating what it knows and doesn't know.

 ---

--- a/apps/web/src/components/practice/StartPracticeModal.tsx
+++ b/apps/web/src/components/practice/StartPracticeModal.tsx
@ -80,7 +80,7 @@ export function StartPracticeModal({
    DEFAULT_PLAN_CONFIG.abacusTermCount?.max ?? 5
  )
  const [problemGenerationMode, setProblemGenerationMode] =
-    useState<ProblemGenerationMode>('adaptive')
+    useState<ProblemGenerationMode>('adaptive-bkt')

  const togglePart = useCallback((partType: keyof EnabledParts) => {
    setEnabledParts((prev) => {
@ -732,14 +732,14 @@ export function StartPracticeModal({
                    >
                      {[
                        {
-                          mode: 'adaptive' as const,
-                          label: 'Adaptive',
-                          desc: 'Bayesian inference (recommended)',
+                          mode: 'adaptive-bkt' as const,
+                          label: 'Focus on weak spots',
+                          desc: 'Practices what you need most (recommended)',
                        },
                        {
                          mode: 'classic' as const,
-                          label: 'Classic',
-                          desc: 'Streak-based thresholds',
+                          label: 'Practice everything',
+                          desc: 'Equal time on all skills',
                        },
                      ].map(({ mode, label, desc }) => {
                        const isSelected = problemGenerationMode === mode
--- a/apps/web/src/lib/curriculum/config/bkt-integration.ts
+++ b/apps/web/src/lib/curriculum/config/bkt-integration.ts
@ -19,15 +19,19 @@

 /**
 * Problem generation algorithm selection
- * - 'adaptive': BKT-based continuous scaling (recommended)
- * - 'classic': Fluency-based discrete states
+ * - 'classic': No BKT targeting, fluency-based cost multipliers
+ * - 'adaptive': BKT skill targeting, fluency-based cost multipliers (current default)
+ * - 'adaptive-bkt': BKT skill targeting, BKT-based cost multipliers (experimental)
 */
-export type ProblemGenerationMode = 'adaptive' | 'classic'
+export type ProblemGenerationMode = 'classic' | 'adaptive' | 'adaptive-bkt'

 /**
 * Default problem generation mode for new sessions
+ *
+ * 'adaptive-bkt' uses BKT for both skill targeting AND cost multipliers.
+ * This is the most data-driven approach, using full learning history.
 */
-export const DEFAULT_PROBLEM_GENERATION_MODE: ProblemGenerationMode = 'adaptive'
+export const DEFAULT_PROBLEM_GENERATION_MODE: ProblemGenerationMode = 'adaptive-bkt'

 // =============================================================================
 // BKT Confidence Thresholds
--- a/apps/web/src/lib/curriculum/session-planner.ts
+++ b/apps/web/src/lib/curriculum/session-planner.ts
@ -161,15 +161,17 @@ export async function generateSessionPlan(
    getPlayerCurriculum(playerId),
    getAllSkillMastery(playerId),
    getRecentSessions(playerId, 10),
-    // Only load problem history for BKT in adaptive mode
-    problemGenerationMode === 'adaptive'
+    // Only load problem history for BKT in adaptive modes
+    problemGenerationMode === 'adaptive' || problemGenerationMode === 'adaptive-bkt'
      ? getRecentSessionResults(playerId, BKT_INTEGRATION_CONFIG.sessionHistoryDepth)
      : Promise.resolve([]),
  ])

-  // Compute BKT if in adaptive mode and we have problem history
+  // Compute BKT if in adaptive modes and we have problem history
  let bktResults: Map<string, SkillBktResult> | undefined
-  if (problemGenerationMode === 'adaptive' && problemHistory.length > 0) {
+  const usesBktTargeting =
+    problemGenerationMode === 'adaptive' || problemGenerationMode === 'adaptive-bkt'
+  if (usesBktTargeting && problemHistory.length > 0) {
    const bktResult = computeBktFromHistory(problemHistory)
    bktResults = new Map(bktResult.skills.map((s) => [s.skillId, s]))

@ -187,7 +189,7 @@ export async function generateSessionPlan(
    }
  } else if (process.env.DEBUG_SESSION_PLANNER === 'true') {
    console.log(
-      `[SessionPlanner] Mode: ${problemGenerationMode}, no BKT (history=${problemHistory.length})`
+      `[SessionPlanner] Mode: ${problemGenerationMode}, no BKT (usesBktTargeting=${usesBktTargeting}, history=${problemHistory.length})`
    )
  }

@ -240,8 +242,8 @@ export async function generateSessionPlan(
  const struggling = findStrugglingSkills(skillMastery)
  const needsReview = findSkillsNeedingReview(skillMastery, config.reviewIntervalDays)

-  // Identify weak skills from BKT for targeting (adaptive mode only)
-  const weakSkills = problemGenerationMode === 'adaptive' ? identifyWeakSkills(bktResults) : []
+  // Identify weak skills from BKT for targeting (adaptive modes only)
+  const weakSkills = usesBktTargeting ? identifyWeakSkills(bktResults) : []

  if (process.env.DEBUG_SESSION_PLANNER === 'true' && weakSkills.length > 0) {
    console.log(`[SessionPlanner] Identified ${weakSkills.length} weak skills for targeting:`)
--- a/apps/web/src/test/journey-simulator/JourneyRunner.ts
+++ b/apps/web/src/test/journey-simulator/JourneyRunner.ts
@ -127,6 +127,7 @@ export class JourneyRunner {
    const sessionExposures = new Map<string, number>()
    let correctCount = 0
    let totalProblems = 0
+    let sessionFatigue = 0

    // Process all parts and slots
    for (const part of startedPlan.parts) {
@ -137,9 +138,11 @@ export class JourneyRunner {

        // Simulate the student answering this problem
        // Note: This also increments exposure counts in the student model
+        // and calculates fatigue based on true probabilities BEFORE learning
        const answer = this.student.answerProblem(slot.problem)

        if (answer.isCorrect) correctCount++
+        sessionFatigue += answer.fatigue

        // Track session-specific skill exposures
        for (const skillId of answer.skillsChallenged) {
@ -203,6 +206,8 @@ export class JourneyRunner {
      accuracy: totalProblems > 0 ? correctCount / totalProblems : 0,
      problemsAttempted: totalProblems,
      sessionPlanId: plan.id,
+      // Cognitive fatigue accumulated during this session
+      sessionFatigue,
    }
  }

@ -225,11 +230,17 @@ export class JourneyRunner {
    // Build skill trajectories
    const skillTrajectories = this.buildSkillTrajectories(snapshots)

+    // Calculate total fatigue across all sessions
+    const totalFatigue = snapshots.reduce((sum, s) => sum + s.sessionFatigue, 0)
+    const avgFatiguePerSession = snapshots.length > 0 ? totalFatigue / snapshots.length : 0
+
    return {
      bktCorrelation,
      weakSkillSurfacing,
      accuracyImprovement,
      skillTrajectories,
+      totalFatigue,
+      avgFatiguePerSession,
    }
  }

--- a/apps/web/src/test/journey-simulator/SimulatedStudent.ts
+++ b/apps/web/src/test/journey-simulator/SimulatedStudent.ts
@ -24,6 +24,23 @@ import type { GeneratedProblem, HelpLevel } from '@/db/schema/session-plans'
 import type { SeededRandom } from './SeededRandom'
 import type { SimulatedAnswer, StudentProfile } from './types'

+/**
+ * Convert true probability to a cognitive load multiplier.
+ *
+ * Higher P(correct) → lower multiplier (more automated, less fatiguing)
+ * Lower P(correct) → higher multiplier (less automated, more fatiguing)
+ *
+ * This is the "ground truth" multiplier based on actual skill mastery,
+ * used to measure fatigue independently of what budgeting system was used.
+ */
+export function getTrueMultiplier(trueP: number): number {
+  if (trueP >= 0.9) return 1.0 // Automated
+  if (trueP >= 0.7) return 1.5 // Nearly automated
+  if (trueP >= 0.5) return 2.0 // Halfway
+  if (trueP >= 0.3) return 3.0 // Struggling
+  return 4.0 // Very weak
+}
+
 /**
 * Simulates a learning student using exposure-based Hill function model.
 */
@ -76,10 +93,22 @@ export class SimulatedStudent {
   * simulating that the student is learning from the attempt itself.
   * This matches real learning where engaging with material teaches you,
   * regardless of whether you get it right.
+   *
+   * Fatigue is calculated BEFORE exposure increment, representing the
+   * cognitive load of the problem based on the student's state when
+   * they first see it.
   */
  answerProblem(problem: GeneratedProblem): SimulatedAnswer {
    const skillsChallenged = problem.skillsRequired ?? []

+    // Calculate fatigue BEFORE incrementing exposure
+    // This represents cognitive load at the moment of problem presentation
+    let fatigue = 0
+    for (const skillId of skillsChallenged) {
+      const trueP = this.getTrueProbability(skillId)
+      fatigue += getTrueMultiplier(trueP)
+    }
+
    // Increment exposure for all skills BEFORE calculating probability
    // (Learning happens from the attempt, not from success)
    for (const skillId of skillsChallenged) {
@ -103,6 +132,7 @@ export class SimulatedStudent {
      responseTimeMs,
      helpLevelUsed: helpLevel,
      skillsChallenged,
+      fatigue,
    }
  }

--- a/apps/web/src/test/journey-simulator/comprehensive-ab-test.test.ts
+++ b/apps/web/src/test/journey-simulator/comprehensive-ab-test.test.ts
@ -669,6 +669,212 @@ describe('Comprehensive A/B Test: Per-Skill Deficiency', () => {
      )
    }, 1800000) // 30 min timeout
  })
+
+  describe('3-Way Comparison: Learning Rate vs Fatigue', () => {
+    /**
+     * Compares three modes:
+     * - classic: No BKT targeting, fluency-based cost multipliers
+     * - adaptive: BKT skill targeting, fluency-based cost multipliers
+     * - adaptive-bkt: BKT skill targeting, BKT-based cost multipliers
+     *
+     * Metrics:
+     * - Learning rate: Sessions to reach 50%/80% mastery on deficient skill
+     * - Fatigue: Total cognitive load during practice (lower = better)
+     *
+     * Hypothesis:
+     * - adaptive-bkt should have similar learning rate to adaptive
+     * - adaptive-bkt should have LOWER fatigue (more accurate budgeting)
+     */
+    it('should compare learning rate and fatigue across 3 modes', async () => {
+      const testProfiles = getRepresentativeProfilesAllLearners().filter(
+        (p) => p.skillId.includes('fiveComplements') && p.learnerType === 'fast'
+      )
+
+      type ModeType = 'classic' | 'adaptive' | 'adaptive-bkt'
+      const modes: ModeType[] = ['classic', 'adaptive', 'adaptive-bkt']
+
+      const results: Array<{
+        skillId: string
+        learnerType: LearnerType
+        mode: ModeType
+        sessionsTo50: number | null
+        sessionsTo80: number | null
+        finalMastery: number
+        totalFatigue: number
+        avgFatiguePerSession: number
+      }> = []
+
+      for (const { skillId, learnerType, profile, practicingSkills } of testProfiles.slice(0, 3)) {
+        for (const mode of modes) {
+          const skillSlug = skillId.replace(/[^a-zA-Z0-9]/g, '-')
+          const { playerId } = await createTestStudent(
+            ephemeralDb.db,
+            `3way-${mode}-${learnerType}-${skillSlug}`
+          )
+
+          const rng = new SeededRandom(QUICK_CONFIG.seed)
+          const student = new SimulatedStudent(profile, rng)
+          student.ensureSkillsTracked(practicingSkills)
+
+          // Track per-session mastery
+          const masteryPerSession: number[] = [student.getTrueProbability(skillId)]
+
+          // Run sessions one at a time to track trajectory
+          for (let s = 0; s < QUICK_CONFIG.sessionCount; s++) {
+            const sessionConfig: JourneyConfig = {
+              ...QUICK_CONFIG,
+              sessionCount: 1,
+              practicingSkills,
+              profile,
+              mode,
+            }
+            const runner = new JourneyRunner(ephemeralDb.db, student, sessionConfig, rng, playerId)
+            await runner.run()
+            masteryPerSession.push(student.getTrueProbability(skillId))
+          }
+
+          // Run full journey for fatigue metrics
+          const { playerId: fullPlayerId } = await createTestStudent(
+            ephemeralDb.db,
+            `3way-full-${mode}-${learnerType}-${skillSlug}`
+          )
+          const fullRng = new SeededRandom(QUICK_CONFIG.seed)
+          const fullStudent = new SimulatedStudent(profile, fullRng)
+          fullStudent.ensureSkillsTracked(practicingSkills)
+
+          const fullConfig: JourneyConfig = {
+            ...QUICK_CONFIG,
+            practicingSkills,
+            profile,
+            mode,
+          }
+          const fullRunner = new JourneyRunner(
+            ephemeralDb.db,
+            fullStudent,
+            fullConfig,
+            fullRng,
+            fullPlayerId
+          )
+          const fullResult = await fullRunner.run()
+
+          // Find sessions to reach thresholds
+          let sessionsTo50: number | null = null
+          let sessionsTo80: number | null = null
+          for (let i = 0; i < masteryPerSession.length; i++) {
+            if (sessionsTo50 === null && masteryPerSession[i] >= 0.5) {
+              sessionsTo50 = i
+            }
+            if (sessionsTo80 === null && masteryPerSession[i] >= 0.8) {
+              sessionsTo80 = i
+            }
+          }
+
+          results.push({
+            skillId,
+            learnerType,
+            mode,
+            sessionsTo50,
+            sessionsTo80,
+            finalMastery: masteryPerSession[masteryPerSession.length - 1],
+            totalFatigue: fullResult.finalMetrics.totalFatigue,
+            avgFatiguePerSession: fullResult.finalMetrics.avgFatiguePerSession,
+          })
+        }
+      }
+
+      // Output results table
+      console.log(`\n${'='.repeat(140)}`)
+      console.log('3-WAY COMPARISON: Learning Rate vs Fatigue')
+      console.log('='.repeat(140))
+      console.log(
+        '\n| Skill                          | Mode         | →50% | →80% | Final | TotalFatigue | AvgFatigue |'
+      )
+      console.log(
+        '|--------------------------------|--------------|------|------|-------|--------------|------------|'
+      )
+
+      for (const r of results) {
+        console.log(
+          `| ${r.skillId.padEnd(30)} | ${r.mode.padEnd(12)} | ${(r.sessionsTo50?.toString() ?? '-').padStart(4)} | ${(r.sessionsTo80?.toString() ?? '-').padStart(4)} | ${(r.finalMastery * 100).toFixed(0).padStart(4)}% | ${r.totalFatigue.toFixed(1).padStart(12)} | ${r.avgFatiguePerSession.toFixed(1).padStart(10)} |`
+        )
+      }
+
+      // Group by skill and compare modes
+      const skills = [...new Set(results.map((r) => r.skillId))]
+
+      console.log(`\n${'='.repeat(80)}`)
+      console.log('COMPARISON BY SKILL')
+      console.log('='.repeat(80))
+
+      for (const skillId of skills) {
+        const skillResults = results.filter((r) => r.skillId === skillId)
+        const classic = skillResults.find((r) => r.mode === 'classic')!
+        const adaptive = skillResults.find((r) => r.mode === 'adaptive')!
+        const adaptiveBkt = skillResults.find((r) => r.mode === 'adaptive-bkt')!
+
+        console.log(`\n${skillId}:`)
+        console.log(`  Learning (sessions to 80%):`)
+        console.log(`    classic:      ${classic.sessionsTo80 ?? 'never'}`)
+        console.log(`    adaptive:     ${adaptive.sessionsTo80 ?? 'never'}`)
+        console.log(`    adaptive-bkt: ${adaptiveBkt.sessionsTo80 ?? 'never'}`)
+        console.log(`  Fatigue (avg per session):`)
+        console.log(`    classic:      ${classic.avgFatiguePerSession.toFixed(1)}`)
+        console.log(`    adaptive:     ${adaptive.avgFatiguePerSession.toFixed(1)}`)
+        console.log(`    adaptive-bkt: ${adaptiveBkt.avgFatiguePerSession.toFixed(1)}`)
+
+        // Calculate improvement
+        const learningImprovementAdaptive =
+          classic.sessionsTo80 && adaptive.sessionsTo80
+            ? ((classic.sessionsTo80 - adaptive.sessionsTo80) / classic.sessionsTo80) * 100
+            : null
+        const learningImprovementBkt =
+          classic.sessionsTo80 && adaptiveBkt.sessionsTo80
+            ? ((classic.sessionsTo80 - adaptiveBkt.sessionsTo80) / classic.sessionsTo80) * 100
+            : null
+        const fatigueReductionAdaptive =
+          ((classic.avgFatiguePerSession - adaptive.avgFatiguePerSession) /
+            classic.avgFatiguePerSession) *
+          100
+        const fatigueReductionBkt =
+          ((classic.avgFatiguePerSession - adaptiveBkt.avgFatiguePerSession) /
+            classic.avgFatiguePerSession) *
+          100
+
+        console.log(`  vs classic:`)
+        console.log(
+          `    adaptive:     ${learningImprovementAdaptive?.toFixed(0) ?? 'N/A'}% faster learning, ${fatigueReductionAdaptive.toFixed(1)}% fatigue change`
+        )
+        console.log(
+          `    adaptive-bkt: ${learningImprovementBkt?.toFixed(0) ?? 'N/A'}% faster learning, ${fatigueReductionBkt.toFixed(1)}% fatigue change`
+        )
+      }
+
+      // Summary statistics
+      const classicResults = results.filter((r) => r.mode === 'classic')
+      const adaptiveResults = results.filter((r) => r.mode === 'adaptive')
+      const adaptiveBktResults = results.filter((r) => r.mode === 'adaptive-bkt')
+
+      const avgFatigueClassic =
+        classicResults.reduce((sum, r) => sum + r.avgFatiguePerSession, 0) / classicResults.length
+      const avgFatigueAdaptive =
+        adaptiveResults.reduce((sum, r) => sum + r.avgFatiguePerSession, 0) / adaptiveResults.length
+      const avgFatigueBkt =
+        adaptiveBktResults.reduce((sum, r) => sum + r.avgFatiguePerSession, 0) /
+        adaptiveBktResults.length
+
+      console.log(`\n${'='.repeat(60)}`)
+      console.log('SUMMARY')
+      console.log('='.repeat(60))
+      console.log(`\nAverage Fatigue Per Session:`)
+      console.log(`  classic:      ${avgFatigueClassic.toFixed(1)}`)
+      console.log(`  adaptive:     ${avgFatigueAdaptive.toFixed(1)}`)
+      console.log(`  adaptive-bkt: ${avgFatigueBkt.toFixed(1)}`)
+
+      // Both adaptive modes should have reasonable learning rates
+      // adaptive-bkt should have lower or equal fatigue compared to adaptive
+      expect(avgFatigueBkt).toBeLessThanOrEqual(avgFatigueAdaptive * 1.1) // Allow 10% margin
+    }, 1800000) // 30 min timeout
+  })
 })

 /**
--- a/apps/web/src/test/journey-simulator/types.ts
+++ b/apps/web/src/test/journey-simulator/types.ts
@ -111,6 +111,12 @@ export interface SimulatedAnswer {
  helpLevelUsed: HelpLevel
  /** Skills that were actually challenged by this problem */
  skillsChallenged: string[]
+  /**
+   * Cognitive fatigue contribution of this problem.
+   * Sum of getTrueMultiplier(trueP) for each skill, calculated BEFORE exposure increment.
+   * This is the "ground truth" fatigue based on actual skill mastery at the moment.
+   */
+  fatigue: number
 }

 /**
@ -133,6 +139,12 @@ export interface SessionSnapshot {
  problemsAttempted: number
  /** Session plan ID for reference */
  sessionPlanId: string
+  /**
+   * Total cognitive fatigue for this session.
+   * Sum of fatigue for all problems in the session.
+   * Lower is better (less cognitive strain).
+   */
+  sessionFatigue: number
 }

 /**
@ -181,6 +193,17 @@ export interface JourneyMetrics {
  accuracyImprovement: number
  /** Per-skill trajectory data */
  skillTrajectories: Map<string, SkillTrajectory>
+  /**
+   * Total cognitive fatigue across all sessions.
+   * Sum of sessionFatigue for all sessions.
+   * Lower is better (less cognitive strain for the same learning).
+   */
+  totalFatigue: number
+  /**
+   * Average cognitive fatigue per session.
+   * totalFatigue / sessionCount.
+   */
+  avgFatiguePerSession: number
 }

 /**
--- a/apps/web/src/utils/skillComplexity.ts
+++ b/apps/web/src/utils/skillComplexity.ts
@ -15,8 +15,11 @@ import { calculateFluencyState, FLUENCY_CONFIG } from '@/db/schema/player-skill-
 // Import tunable constants from centralized config
 import {
  BASE_SKILL_COMPLEXITY,
+  BKT_INTEGRATION_CONFIG,
+  calculateBktMultiplier,
  DEFAULT_COMPLEXITY_BUDGETS,
  getBaseComplexity,
+  isBktConfident,
  MASTERY_MULTIPLIERS,
  type MasteryState,
  type ProblemGenerationMode,
@ -65,14 +68,16 @@ export interface StudentSkillHistory {
 export interface SkillCostCalculatorOptions {
  /**
   * BKT results keyed by skillId.
-   * When provided with mode='adaptive', uses P(known) for continuous multipliers.
+   * Used for skill targeting in all adaptive modes.
+   * Used for cost calculation only in 'adaptive-bkt' mode.
   */
  bktResults?: Map<string, SkillBktResult>

  /**
   * Problem generation mode:
-   * - 'adaptive': BKT-based continuous scaling (default)
-   * - 'classic': Fluency-based discrete states
+   * - 'classic': No BKT targeting, fluency-based cost multipliers
+   * - 'adaptive': BKT skill targeting, fluency-based cost multipliers (default)
+   * - 'adaptive-bkt': BKT skill targeting, BKT-based cost multipliers
   */
  mode?: ProblemGenerationMode
 }
@ -125,16 +130,15 @@ export interface SkillCostCalculator {
 /**
 * Creates a skill cost calculator based on student's skill history.
 *
- * IMPORTANT: Cost calculation ALWAYS uses fluency-based multipliers.
- * BKT is stored but NOT used for cost calculation - it's only used for
- * skill TARGETING in session-planner.ts.
+ * Cost calculation depends on mode:
+ * - 'classic' / 'adaptive': Use fluency-based discrete multipliers
+ * - 'adaptive-bkt': Use BKT P(known) for continuous multipliers (experimental)
 *
- * This separation ensures:
- * - Difficulty control (mastery multipliers) works correctly
- * - BKT identifies weak skills for targeting, not filtering them out
+ * Note: In 'adaptive' mode, BKT is used for skill TARGETING but not cost.
+ * In 'adaptive-bkt' mode, BKT is used for both targeting AND cost.
 *
 * @param studentHistory - Student's skill history for fluency-based multipliers
- * @param options - Optional BKT results (for targeting) and mode selection
+ * @param options - Optional BKT results (for targeting/cost) and mode selection
 */
 export function createSkillCostCalculator(
  studentHistory: StudentSkillHistory,
@ -145,17 +149,21 @@ export function createSkillCostCalculator(
  /**
   * Get multiplier for a skill.
   *
-   * IMPORTANT: Always uses fluency-based multipliers for cost calculation.
-   * BKT is NOT used for cost - it's only used for skill TARGETING (see session-planner.ts).
-   *
-   * Rationale: The mastery multiplier controls problem difficulty by making
-   * mastered skills cost less and unmastered skills cost more. This prevents
-   * overwhelming students. BKT's role is separate: identifying which skills
-   * to prioritize for practice, not inflating their cost.
+   * In 'classic' and 'adaptive' modes: Uses fluency-based multipliers.
+   * In 'adaptive-bkt' mode: Uses BKT P(known) for continuous multipliers,
+   * falling back to fluency when BKT confidence is too low.
   */
  function getMultiplierForSkill(skillId: string): number {
-    // Always use fluency-based multiplier for cost calculation
-    // BKT is used for skill targeting in session-planner.ts, not for cost
+    // In adaptive-bkt mode, use BKT for cost calculation if confident
+    if (mode === 'adaptive-bkt' && bktResults) {
+      const bktResult = bktResults.get(skillId)
+      if (bktResult && isBktConfident(bktResult.confidence)) {
+        return calculateBktMultiplier(bktResult.pKnown)
+      }
+      // Fall back to fluency if BKT not confident or unavailable
+    }
+
+    // Default: use fluency-based multiplier for cost calculation
    return getFluencyMultiplier(skillId, studentHistory)
  }