Problem: The worksheet page had 1.7-2.3s TTFB because the 1.25MB SVG
preview was being serialized into the initial HTML response, blocking
first paint.
Solution: Use React Suspense to stream the preview separately:
- Page shell renders immediately with settings (~200ms TTFB)
- Preview generates async and streams in when ready (~1.5s later)
- User sees the UI instantly, preview appears with loading skeleton
New components:
- StreamedPreview: async server component that generates preview
- PreviewSkeleton: loading placeholder while streaming
- StreamedPreviewContext: shares streamed data with PreviewCenter
- PreviewDataInjector: bridges server-streamed data to client context
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add watchAllNamespaces=true to Keel helm config so it monitors
workloads in the abaci namespace (not just keel namespace).
Update documentation to clarify that Keel annotations must be on
the workload metadata, not the pod template.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Keel reads annotations from the workload's metadata, not the pod template.
Moving annotations from spec.template.metadata to metadata fixes auto-updates.
Also:
- Set NAMESPACE="" on Keel deployment to watch all namespaces
- Keep ghcr credentials config (optional, for private registries)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Keel needs to authenticate with ghcr.io to poll for new image digests
(ghcr.io requires auth for manifest API even on public images).
- Add ghcr_token and ghcr_username variables
- Create docker-registry secret for ghcr.io
- Add imagePullSecrets to StatefulSet (Keel reads these for auth)
- Document the setup in keel.tf
To enable auto-updates:
1. Create GitHub PAT with read:packages scope
2. Set ghcr_token in terraform.tfvars
3. terraform apply
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
LiteFS replicas are read-only, so migrations fail with "read only replica"
error. Check LITEFS_CANDIDATE env var and skip migrations on replicas.
The primary (pod-0) will run migrations and replicate the changes.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Track where time is spent during worksheet page render:
- loadWorksheetSettings (DB query + getViewerId)
- generateWorksheetPreview (problem generation + Typst compilation)
- Total page render time
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add openai_api_key variable to terraform configuration for AI-powered
features like flowchart generation. The key is stored as a k8s secret
and exposed to pods as LLM_OPENAI_API_KEY environment variable.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Explain why LiteFS proxy fly-replay doesn't work outside Fly.io
- Document the primary service and IngressRoute solution
- Add troubleshooting symptoms for broken write routing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
LiteFS proxy on replica pods returns fly-replay header expecting Fly.io's
infrastructure to re-route requests to the primary. Since we're on k8s,
Traefik doesn't understand this header and returns empty responses.
Solution:
- Add abaci-app-primary service targeting only pod-0 (the LiteFS primary)
- Add Traefik IngressRoute matching POST/PUT/DELETE/PATCH methods
- Route these write requests directly to the primary service
- GET requests still load-balance across all replicas for reads
This fixes the intermittent empty PDF responses where ~60-80% of POST
requests were failing due to hitting replica pods.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add flowchart_version_history table to store snapshots after generate/refine
- Create versions API endpoint (GET list, POST restore)
- Add History tab with version list showing source, validation status, timestamp
- Implement inline preview mode to view historical versions without restoring
- Preview mode shows amber banner and updates diagram, examples, worksheet, tests
- Hide structure/input tabs (not useful currently)
- Add preview notice in refinement panel clarifying behavior
- Update React Query documentation with comprehensive patterns
- Add versionHistoryKeys to central query key factory
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix race condition where watch endpoint couldn't find active generation
because generate hadn't registered yet. Workshop page now triggers
/generate before connecting to /watch.
- Add polling fallback in watch endpoint (up to 3s) for edge cases where
generate route is still starting up.
- Add progress panel for regeneration - was missing because the panel
was only shown when !hasDraft.
- Add comprehensive logging throughout generation pipeline for debugging.
- Improve generation registry with subscriber management and accumulated
reasoning text for reconnection support.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Gatus UI only shows hostnames, not full URLs. Include the path
directly in the endpoint name for clarity.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Organize endpoints into logical groups: Website, Arcade, Worksheets, Flowcharts, Core API, Infrastructure
- Add hide-url: false to show actual URLs on status page
- Use user-friendly names like "Games Hub", "Worksheet Builder", "Flashcard Generator"
- Remove confusing internal service endpoints
- Check database and Redis via infrastructure group
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update architecture diagram to show NAS Traefik as entry point
- Add "Adding New Subdomains" guide with DNS, NAS Traefik, and k3s steps
- Document network architecture in CLAUDE.md for agents
- Note services.yaml location on NAS
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Gatus deployment monitoring homepage, health API, Redis, DB
- Simplified ingress (HTTP via NAS Traefik handles SSL)
- Updated NAS Traefik services.yaml with status subdomain routes
Access: https://status.abaci.one
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Gatus deployment with SQLite persistence
- ConfigMap with endpoint monitors (homepage, health API, Redis, DB)
- Ingress with SSL via cert-manager
- DNS CNAME record already configured
Deploy with: terraform apply
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Keel helm release that polls ghcr.io every 2 minutes
- Add keel.sh annotations to app StatefulSet for auto-updates
- Create comprehensive README.md documenting k3s architecture
- Update CLAUDE.md with automatic deployment workflow
After terraform apply, deployments are fully automatic:
push to main → build → Keel detects new image → rolling update
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rewrite DebugMermaidDiagram edge matching to use BFS graph traversal
- Build graph from SVG edges (L_FROM_TO_INDEX format) for path finding
- Handle phase boundary disconnections with bidirectional BFS:
- Forward BFS finds all nodes reachable from start
- Backward BFS finds all nodes that can reach end
- Combines both to highlight intermediate nodes across phase gaps
- Remove complex pattern matching in favor of graph-based approach
- Auto-compute edge IDs as {nodeId}_{optionValue} in loader.ts
- Add computeEdgeId() helper to schema.ts for consistent edge ID generation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Pod-0 remains LiteFS primary (handles writes), pod-1 and pod-2 are
replicas that serve reads and forward writes to primary.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of warning about missing edge IDs in the doctor, automatically
assign computed edge IDs ({nodeId}_{optionValue}) to decision edges
that have auto-generated IDs (edge_N) during flowchart loading.
This makes edge highlighting work for legacy flowcharts without
requiring regeneration.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add computeEdgeId() helper that generates edge IDs as {nodeId}_{optionValue}
- Update loader.ts to compute edge IDs automatically from decision options
- Update parser.ts to extract edge IDs from mermaid id@--> syntax
- Add MERM-003 diagnostic in doctor.ts to detect missing edge IDs
- Update LLM schemas to document the required edge ID pattern
- Update DebugMermaidDiagram to match edges by ID (with index fallback)
Edge IDs enable reliable highlighting of decision edges during visualization.
The pattern is deterministic: for a decision node "COMPARE" with option
value "direct", the expected edge ID is "COMPARE_direct".
Mermaid content must use: COMPARE COMPARE_direct@-->|"DIRECT"| NEXT_NODE
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 3 - Mermaid Highlighting:
- Add highlightedNodeId prop to DebugMermaidDiagram for trace hover highlighting
- Cyan dashed border distinguishes trace hover from walker progress (amber)
Phase 4 - Problem Trace Component:
- Create ProblemTrace.tsx displaying step-by-step computation trace
- Shows node title, transforms applied, working problem evolution
- Timeline UI with expand/collapse for each step
- Integrate into WorksheetDebugPanel expanded details
Phase 5 - Unified Answer Computation:
- Update WorksheetDebugPanel to use simulateWalk + extractAnswer
- Update worksheet-generator.ts to use unified computation path
- Update test-case-validator.ts runTestCaseWithFlowchart to use simulateWalk
- All places with full ExecutableFlowchart now use single code path
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Transform the flowchart system from "compute everything upfront" to
"walk IS the computation". This is the foundation for the new unified
computation model.
Phase 1 - Schema & Core Runtime:
- Add TransformExpression, StateSnapshot, DisplayTemplate, AnswerDefinition
- Add StructuredTestCase for primitive-based test validation
- Update FlowchartState with values, snapshots, hasError fields
- Mark variables as deprecated (optional) for transition period
- Add interpolateTemplate() for {{name}} and {{=expr}} syntax
- Add applyTransforms(), extractAnswer(), simulateWalk() to loader
- Add createContextFromValues() for transform execution
Phase 2 - Walker Integration:
- Apply transforms when entering each node during walk
- Initialize entry node transforms on state creation
- Snapshots now accumulate as nodes are visited
All existing flowcharts continue to work via backwards compatibility
with the legacy variables section.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a flexible logging system to the llm-client package that can be
enabled/disabled without rebuilding:
- Add Logger class with configurable enable/disable and custom logger support
- Add LogLevel, LoggerFn, LoggingConfig types
- Add `debug` option to LLMStreamRequest for per-request logging override
- Add setLogging() method for runtime enable/disable
- Replace hardcoded console.log in openai-responses provider with logger
- Add ?debug=true query param to flowchart generate endpoint
Usage:
- Per-request: llm.stream({ ..., debug: true })
- Global: llm.setLogging({ enabled: true })
- Custom logger: new LLMClient({ logging: { enabled: true, logger: fn } })
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Hardcoded flowcharts are now "seeds" that can be manually populated
into the database via a debug UI. This provides a single source of
truth (database) while keeping canonical definitions in version control.
Changes:
- Add /api/flowcharts/seeds endpoint for seed management
- Add SeedManagerPanel component (visible in debug mode on /flowchart)
- Rename FLOWCHARTS -> FLOWCHART_SEEDS in definitions/index.ts
- Remove hardcoded fallbacks from getFlowchartByIdAsync/getFlowchartListAsync
- Update browse API to only load from database
- Update all dependent files to use database-only loading
- Seeds are owned by the user who initiates seeding
To use: Enable debug mode on /flowchart, use Seed Manager panel to
populate the database with built-in flowcharts.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Server components read from request headers, not response headers.
This fixes the "No valid viewer session found" error for new visitors
on pages like /practice that need guest identification on first load.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove legacy schema-specific formatting fallbacks in formatting.ts and example-generator.ts
- All flowcharts now require explicit display.problem and display.answer expressions
- Add DISP-003 diagnostic for missing display.problem expressions
- Update doctor to treat missing display.answer as error (was warning)
Also includes:
- Terraform: generate LiteFS config at runtime, add AUTH_TRUST_HOST, add volume mounts for vision-training and uploads data
- Terraform: add storage.tf for persistent volume claims
- Add Claude instructions for terraform directory
- Various UI component formatting updates
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add expectedAnswer field to ProblemExample schema for test validation
- Create test-case-validator.ts with functions to evaluate display.answer
and compare against expected answers
- Add TestsTab.tsx component showing test results and path coverage
- Integrate validation into generate/refine routes with SSE events
- Add coverage diagnostics to flowchart doctor (TEST-001/002/003)
- Fix LLM output normalization: strip wrapper quotes from strings
(e.g., "'+'" -> "+") and convert numeric strings to numbers
- Use formatAnswerDisplay for test evaluation (same as worksheet)
- Update LLM prompts with clearer excludeFromExampleStructure guidance
for result-formatting decisions vs problem-type decisions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
LiteFS needs the actual pod hostname for cluster communication,
but HOSTNAME=0.0.0.0 was being set in both the Dockerfile and
ConfigMap, overriding the pod's hostname.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add LiteFS binary and config to Docker image for SQLite replication
- Convert k8s Deployment to StatefulSet for stable pod identities
- Pod-0 is primary (handles writes), others are replicas
- LiteFS proxy forwards write requests to primary automatically
- Add headless service for pod-to-pod communication
- Increase Node.js heap size to 4GB for Next.js build
- Exclude large Python venvs from Docker context
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Panda CSS token values in shorthand strings (e.g., `padding: '2 4'`)
silently fail. Convert all 84+ occurrences to paddingX/paddingY and
marginX/marginY properties which correctly resolve design tokens.
Affected areas:
- Flowchart pages and components
- Know Your World game components
- KidNumberInput component
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Terraform now deploys a complete k8s environment:
- cert-manager with Let's Encrypt (staging + prod issuers)
- Redis deployment with persistent storage
- App deployment (2 replicas, rolling updates)
- Traefik ingress with SSL, HSTS, HTTP→HTTPS redirect
Ready for switchover by forwarding ports 80/443 to k3s VM.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Set up Terraform to manage k3s resources on the NAS VM:
- Kubernetes and Helm providers configured
- Created 'abaci' namespace for workloads
- Ready for BullMQ workers and future scalable services
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Avoid terminology confusion - "skills" refers to invokable commands
like /fix-css and /porkbun-dns. The documentation files are
step-by-step procedures, not invocable skills.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace sequential example generation with a proper task queue system that
correctly handles concurrent requests to the Web Worker pool.
Root cause of previous issues: Each worker stored only ONE resolve/reject
callback, so concurrent requests would overwrite each other's callbacks,
causing promises to never resolve or resolve with wrong data.
Solution:
- Add unique requestId to all worker messages for request/response matching
- Implement task queue with dispatch logic for pending work
- Track pending requests in a Map keyed by requestId
- Workers echo back requestId so responses match their originating requests
- Both /flowchart page and workshop page now generate concurrently
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When both WorksheetDebugPanel and FlowchartExampleGrid try to generate
examples simultaneously using the shared web worker pool, the workers'
resolve/reject callbacks get overwritten, causing one request to never
complete.
This fix sequences the generation:
- WorksheetDebugPanel generates first (when worksheet tab is active)
- FlowchartExampleGrid waits until WorksheetDebugPanel signals completion
- Added onGenerationStart/onGenerationComplete callbacks to WorksheetDebugPanel
- Added waitForReady prop to FlowchartExampleGrid to defer generation
- Workshop page coordinates the sequence using isDebugPanelGenerating state
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add MERM-002 doctor diagnostic to detect when JSON node IDs don't
match mermaid node IDs
- Update loader to throw error when entry node is missing or >50% of
nodes are missing from mermaid (prevents crash loops)
- Add flowchartLoadError state and UI display in workshop page
- Improve LLM schema documentation for display.answer vs generation.target
- Add context-aware division-by-zero suggestions in doctor
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Combine published and draft example generation into single unified effect
- Fix race condition where worker pool was cancelling requests when
drafts and published flowcharts competed for the same workers
- Add draftMermaidContent to sessions API response (was missing)
- Remove redundant draftCardExamples state in favor of unified cardExamples
- Process all flowcharts sequentially to avoid worker pool cancellation
- Show animated backgrounds on healthy draft flowcharts, not just published
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>