LiteFS replicas are read-only, so migrations fail with "read only replica"
error. Check LITEFS_CANDIDATE env var and skip migrations on replicas.
The primary (pod-0) will run migrations and replicate the changes.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Track where time is spent during worksheet page render:
- loadWorksheetSettings (DB query + getViewerId)
- generateWorksheetPreview (problem generation + Typst compilation)
- Total page render time
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add openai_api_key variable to terraform configuration for AI-powered
features like flowchart generation. The key is stored as a k8s secret
and exposed to pods as LLM_OPENAI_API_KEY environment variable.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Explain why LiteFS proxy fly-replay doesn't work outside Fly.io
- Document the primary service and IngressRoute solution
- Add troubleshooting symptoms for broken write routing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
LiteFS proxy on replica pods returns fly-replay header expecting Fly.io's
infrastructure to re-route requests to the primary. Since we're on k8s,
Traefik doesn't understand this header and returns empty responses.
Solution:
- Add abaci-app-primary service targeting only pod-0 (the LiteFS primary)
- Add Traefik IngressRoute matching POST/PUT/DELETE/PATCH methods
- Route these write requests directly to the primary service
- GET requests still load-balance across all replicas for reads
This fixes the intermittent empty PDF responses where ~60-80% of POST
requests were failing due to hitting replica pods.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add flowchart_version_history table to store snapshots after generate/refine
- Create versions API endpoint (GET list, POST restore)
- Add History tab with version list showing source, validation status, timestamp
- Implement inline preview mode to view historical versions without restoring
- Preview mode shows amber banner and updates diagram, examples, worksheet, tests
- Hide structure/input tabs (not useful currently)
- Add preview notice in refinement panel clarifying behavior
- Update React Query documentation with comprehensive patterns
- Add versionHistoryKeys to central query key factory
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix race condition where watch endpoint couldn't find active generation
because generate hadn't registered yet. Workshop page now triggers
/generate before connecting to /watch.
- Add polling fallback in watch endpoint (up to 3s) for edge cases where
generate route is still starting up.
- Add progress panel for regeneration - was missing because the panel
was only shown when !hasDraft.
- Add comprehensive logging throughout generation pipeline for debugging.
- Improve generation registry with subscriber management and accumulated
reasoning text for reconnection support.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Gatus UI only shows hostnames, not full URLs. Include the path
directly in the endpoint name for clarity.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Organize endpoints into logical groups: Website, Arcade, Worksheets, Flowcharts, Core API, Infrastructure
- Add hide-url: false to show actual URLs on status page
- Use user-friendly names like "Games Hub", "Worksheet Builder", "Flashcard Generator"
- Remove confusing internal service endpoints
- Check database and Redis via infrastructure group
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update architecture diagram to show NAS Traefik as entry point
- Add "Adding New Subdomains" guide with DNS, NAS Traefik, and k3s steps
- Document network architecture in CLAUDE.md for agents
- Note services.yaml location on NAS
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Gatus deployment monitoring homepage, health API, Redis, DB
- Simplified ingress (HTTP via NAS Traefik handles SSL)
- Updated NAS Traefik services.yaml with status subdomain routes
Access: https://status.abaci.one
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Gatus deployment with SQLite persistence
- ConfigMap with endpoint monitors (homepage, health API, Redis, DB)
- Ingress with SSL via cert-manager
- DNS CNAME record already configured
Deploy with: terraform apply
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Keel helm release that polls ghcr.io every 2 minutes
- Add keel.sh annotations to app StatefulSet for auto-updates
- Create comprehensive README.md documenting k3s architecture
- Update CLAUDE.md with automatic deployment workflow
After terraform apply, deployments are fully automatic:
push to main → build → Keel detects new image → rolling update
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rewrite DebugMermaidDiagram edge matching to use BFS graph traversal
- Build graph from SVG edges (L_FROM_TO_INDEX format) for path finding
- Handle phase boundary disconnections with bidirectional BFS:
- Forward BFS finds all nodes reachable from start
- Backward BFS finds all nodes that can reach end
- Combines both to highlight intermediate nodes across phase gaps
- Remove complex pattern matching in favor of graph-based approach
- Auto-compute edge IDs as {nodeId}_{optionValue} in loader.ts
- Add computeEdgeId() helper to schema.ts for consistent edge ID generation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Pod-0 remains LiteFS primary (handles writes), pod-1 and pod-2 are
replicas that serve reads and forward writes to primary.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of warning about missing edge IDs in the doctor, automatically
assign computed edge IDs ({nodeId}_{optionValue}) to decision edges
that have auto-generated IDs (edge_N) during flowchart loading.
This makes edge highlighting work for legacy flowcharts without
requiring regeneration.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add computeEdgeId() helper that generates edge IDs as {nodeId}_{optionValue}
- Update loader.ts to compute edge IDs automatically from decision options
- Update parser.ts to extract edge IDs from mermaid id@--> syntax
- Add MERM-003 diagnostic in doctor.ts to detect missing edge IDs
- Update LLM schemas to document the required edge ID pattern
- Update DebugMermaidDiagram to match edges by ID (with index fallback)
Edge IDs enable reliable highlighting of decision edges during visualization.
The pattern is deterministic: for a decision node "COMPARE" with option
value "direct", the expected edge ID is "COMPARE_direct".
Mermaid content must use: COMPARE COMPARE_direct@-->|"DIRECT"| NEXT_NODE
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 3 - Mermaid Highlighting:
- Add highlightedNodeId prop to DebugMermaidDiagram for trace hover highlighting
- Cyan dashed border distinguishes trace hover from walker progress (amber)
Phase 4 - Problem Trace Component:
- Create ProblemTrace.tsx displaying step-by-step computation trace
- Shows node title, transforms applied, working problem evolution
- Timeline UI with expand/collapse for each step
- Integrate into WorksheetDebugPanel expanded details
Phase 5 - Unified Answer Computation:
- Update WorksheetDebugPanel to use simulateWalk + extractAnswer
- Update worksheet-generator.ts to use unified computation path
- Update test-case-validator.ts runTestCaseWithFlowchart to use simulateWalk
- All places with full ExecutableFlowchart now use single code path
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Transform the flowchart system from "compute everything upfront" to
"walk IS the computation". This is the foundation for the new unified
computation model.
Phase 1 - Schema & Core Runtime:
- Add TransformExpression, StateSnapshot, DisplayTemplate, AnswerDefinition
- Add StructuredTestCase for primitive-based test validation
- Update FlowchartState with values, snapshots, hasError fields
- Mark variables as deprecated (optional) for transition period
- Add interpolateTemplate() for {{name}} and {{=expr}} syntax
- Add applyTransforms(), extractAnswer(), simulateWalk() to loader
- Add createContextFromValues() for transform execution
Phase 2 - Walker Integration:
- Apply transforms when entering each node during walk
- Initialize entry node transforms on state creation
- Snapshots now accumulate as nodes are visited
All existing flowcharts continue to work via backwards compatibility
with the legacy variables section.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a flexible logging system to the llm-client package that can be
enabled/disabled without rebuilding:
- Add Logger class with configurable enable/disable and custom logger support
- Add LogLevel, LoggerFn, LoggingConfig types
- Add `debug` option to LLMStreamRequest for per-request logging override
- Add setLogging() method for runtime enable/disable
- Replace hardcoded console.log in openai-responses provider with logger
- Add ?debug=true query param to flowchart generate endpoint
Usage:
- Per-request: llm.stream({ ..., debug: true })
- Global: llm.setLogging({ enabled: true })
- Custom logger: new LLMClient({ logging: { enabled: true, logger: fn } })
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Hardcoded flowcharts are now "seeds" that can be manually populated
into the database via a debug UI. This provides a single source of
truth (database) while keeping canonical definitions in version control.
Changes:
- Add /api/flowcharts/seeds endpoint for seed management
- Add SeedManagerPanel component (visible in debug mode on /flowchart)
- Rename FLOWCHARTS -> FLOWCHART_SEEDS in definitions/index.ts
- Remove hardcoded fallbacks from getFlowchartByIdAsync/getFlowchartListAsync
- Update browse API to only load from database
- Update all dependent files to use database-only loading
- Seeds are owned by the user who initiates seeding
To use: Enable debug mode on /flowchart, use Seed Manager panel to
populate the database with built-in flowcharts.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Server components read from request headers, not response headers.
This fixes the "No valid viewer session found" error for new visitors
on pages like /practice that need guest identification on first load.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove legacy schema-specific formatting fallbacks in formatting.ts and example-generator.ts
- All flowcharts now require explicit display.problem and display.answer expressions
- Add DISP-003 diagnostic for missing display.problem expressions
- Update doctor to treat missing display.answer as error (was warning)
Also includes:
- Terraform: generate LiteFS config at runtime, add AUTH_TRUST_HOST, add volume mounts for vision-training and uploads data
- Terraform: add storage.tf for persistent volume claims
- Add Claude instructions for terraform directory
- Various UI component formatting updates
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add expectedAnswer field to ProblemExample schema for test validation
- Create test-case-validator.ts with functions to evaluate display.answer
and compare against expected answers
- Add TestsTab.tsx component showing test results and path coverage
- Integrate validation into generate/refine routes with SSE events
- Add coverage diagnostics to flowchart doctor (TEST-001/002/003)
- Fix LLM output normalization: strip wrapper quotes from strings
(e.g., "'+'" -> "+") and convert numeric strings to numbers
- Use formatAnswerDisplay for test evaluation (same as worksheet)
- Update LLM prompts with clearer excludeFromExampleStructure guidance
for result-formatting decisions vs problem-type decisions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
LiteFS needs the actual pod hostname for cluster communication,
but HOSTNAME=0.0.0.0 was being set in both the Dockerfile and
ConfigMap, overriding the pod's hostname.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add LiteFS binary and config to Docker image for SQLite replication
- Convert k8s Deployment to StatefulSet for stable pod identities
- Pod-0 is primary (handles writes), others are replicas
- LiteFS proxy forwards write requests to primary automatically
- Add headless service for pod-to-pod communication
- Increase Node.js heap size to 4GB for Next.js build
- Exclude large Python venvs from Docker context
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Panda CSS token values in shorthand strings (e.g., `padding: '2 4'`)
silently fail. Convert all 84+ occurrences to paddingX/paddingY and
marginX/marginY properties which correctly resolve design tokens.
Affected areas:
- Flowchart pages and components
- Know Your World game components
- KidNumberInput component
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Terraform now deploys a complete k8s environment:
- cert-manager with Let's Encrypt (staging + prod issuers)
- Redis deployment with persistent storage
- App deployment (2 replicas, rolling updates)
- Traefik ingress with SSL, HSTS, HTTP→HTTPS redirect
Ready for switchover by forwarding ports 80/443 to k3s VM.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Set up Terraform to manage k3s resources on the NAS VM:
- Kubernetes and Helm providers configured
- Created 'abaci' namespace for workloads
- Ready for BullMQ workers and future scalable services
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Avoid terminology confusion - "skills" refers to invokable commands
like /fix-css and /porkbun-dns. The documentation files are
step-by-step procedures, not invocable skills.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace sequential example generation with a proper task queue system that
correctly handles concurrent requests to the Web Worker pool.
Root cause of previous issues: Each worker stored only ONE resolve/reject
callback, so concurrent requests would overwrite each other's callbacks,
causing promises to never resolve or resolve with wrong data.
Solution:
- Add unique requestId to all worker messages for request/response matching
- Implement task queue with dispatch logic for pending work
- Track pending requests in a Map keyed by requestId
- Workers echo back requestId so responses match their originating requests
- Both /flowchart page and workshop page now generate concurrently
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When both WorksheetDebugPanel and FlowchartExampleGrid try to generate
examples simultaneously using the shared web worker pool, the workers'
resolve/reject callbacks get overwritten, causing one request to never
complete.
This fix sequences the generation:
- WorksheetDebugPanel generates first (when worksheet tab is active)
- FlowchartExampleGrid waits until WorksheetDebugPanel signals completion
- Added onGenerationStart/onGenerationComplete callbacks to WorksheetDebugPanel
- Added waitForReady prop to FlowchartExampleGrid to defer generation
- Workshop page coordinates the sequence using isDebugPanelGenerating state
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add MERM-002 doctor diagnostic to detect when JSON node IDs don't
match mermaid node IDs
- Update loader to throw error when entry node is missing or >50% of
nodes are missing from mermaid (prevents crash loops)
- Add flowchartLoadError state and UI display in workshop page
- Improve LLM schema documentation for display.answer vs generation.target
- Add context-aware division-by-zero suggestions in doctor
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Combine published and draft example generation into single unified effect
- Fix race condition where worker pool was cancelling requests when
drafts and published flowcharts competed for the same workers
- Add draftMermaidContent to sessions API response (was missing)
- Remove redundant draftCardExamples state in favor of unified cardExamples
- Process all flowcharts sequentially to avoid worker pool cancellation
- Show animated backgrounds on healthy draft flowcharts, not just published
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The worksheet generator was hardcoding answer computation for specific
schemas (two-digit-subtraction, fractions, linear equations) and
returning "?" for any unknown schema like custom flowcharts.
Now uses the centralized formatAnswerDisplay() function which properly
handles:
- Custom display.answer expressions defined in the flowchart
- Computed variables from the flowchart definition
- Schema-specific fallback logic
- The generation.target fallback for custom schemas
This fixes PDF worksheets showing "?" answers for teacher-created
flowcharts like "math duck maker" while the debug panel showed
correct answers.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Worksheet improvements:
- Add orderByDifficulty option to sort problems easy→medium→hard
- Add typstAnswer field for proper fraction rendering in answer key
- Add WorksheetDebugPanel to preview generated examples
- Move PDF creation to modal in workshop, make worksheet tab default
- Support worksheet generation from workshop sessions
Flowchart diagnostics:
- Add doctor.ts with validation checks (DISP-002 for missing answer handlers)
- Add FlowchartDiagnostics component for displaying warnings
- Add display.answer config to fraction and linear equation flowcharts
UI refinements:
- Improve AnimatedProblemTile styling
- Enhance FlowchartCard and FlowchartModal components
- Add derived field validation tests for LLM schemas
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add AnimatedProblemTile component with MathDisplay for proper math rendering
- Add AnimatedBackgroundTiles grid component for card backgrounds
- Update FlowchartCard to accept flowchart + examples props
- Generate examples client-side for both hardcoded and database flowcharts
- Use same formatting system (formatProblemDisplay + MathDisplay) as modal
Also includes:
- Fix migration 0076 timestamp ordering issue (linkedPublishedId column)
- Add migration-timestamp-fix skill documenting common drizzle-kit issue
- Update CLAUDE.md with migration timestamp ordering guidance
- Various flowchart workshop and vision training improvements
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>