Deploy Storybook / Build and Deploy Storybook (push) Failing after 7sDetails
- Add self-hosted Gitea server at git.dev.abaci.one
- Configure Gitea Actions runner with Docker-in-Docker
- Set up push mirror to GitHub for backup
- Add Storybook deployment workflow to dev.abaci.one/storybook/
- Update nginx config to serve Storybook from local storage
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Go's pure-Go DNS resolver has incompatibilities with k3s's CoreDNS that
cause intermittent "server misbehaving" errors after the initial lookup.
This prevented Keel from polling ghcr.io for new image digests.
Setting GODEBUG=netdns=cgo forces Go to use the system's cgo DNS resolver,
which works correctly with k3s.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Deploy kube-prometheus-stack to k3s cluster via Terraform
- Add Prometheus metrics endpoint (/api/metrics) using prom-client
- Track Socket.IO connections, HTTP requests, and Node.js runtime
- Configure ServiceMonitor for auto-discovery by Prometheus
- Expose Grafana at grafana.dev.abaci.one
- Expose Prometheus at prometheus.dev.abaci.one
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add nginx static server at dev.abaci.one for serving:
- Playwright HTML reports at /smoke-reports/
- Storybook (future) at /storybook/
- Coverage reports (future) at /coverage/
- NFS-backed PVC shared between artifact producers and nginx
- Smoke tests now save HTML reports with automatic cleanup (keeps 20)
- Reports accessible at dev.abaci.one/smoke-reports/latest/
Infrastructure:
- infra/terraform/dev-artifacts.tf: nginx deployment, PVC, ingress
- Updated smoke-tests.tf to mount shared PVC
- Updated smoke-test-runner.ts to generate and save HTML reports
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Ensures the latest smoke tests image is always pulled, avoiding
stale cached images when updates are pushed.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add watchAllNamespaces=true to Keel helm config so it monitors
workloads in the abaci namespace (not just keel namespace).
Update documentation to clarify that Keel annotations must be on
the workload metadata, not the pod template.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Keel reads annotations from the workload's metadata, not the pod template.
Moving annotations from spec.template.metadata to metadata fixes auto-updates.
Also:
- Set NAMESPACE="" on Keel deployment to watch all namespaces
- Keep ghcr credentials config (optional, for private registries)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Keel needs to authenticate with ghcr.io to poll for new image digests
(ghcr.io requires auth for manifest API even on public images).
- Add ghcr_token and ghcr_username variables
- Create docker-registry secret for ghcr.io
- Add imagePullSecrets to StatefulSet (Keel reads these for auth)
- Document the setup in keel.tf
To enable auto-updates:
1. Create GitHub PAT with read:packages scope
2. Set ghcr_token in terraform.tfvars
3. terraform apply
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add openai_api_key variable to terraform configuration for AI-powered
features like flowchart generation. The key is stored as a k8s secret
and exposed to pods as LLM_OPENAI_API_KEY environment variable.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Explain why LiteFS proxy fly-replay doesn't work outside Fly.io
- Document the primary service and IngressRoute solution
- Add troubleshooting symptoms for broken write routing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
LiteFS proxy on replica pods returns fly-replay header expecting Fly.io's
infrastructure to re-route requests to the primary. Since we're on k8s,
Traefik doesn't understand this header and returns empty responses.
Solution:
- Add abaci-app-primary service targeting only pod-0 (the LiteFS primary)
- Add Traefik IngressRoute matching POST/PUT/DELETE/PATCH methods
- Route these write requests directly to the primary service
- GET requests still load-balance across all replicas for reads
This fixes the intermittent empty PDF responses where ~60-80% of POST
requests were failing due to hitting replica pods.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Gatus UI only shows hostnames, not full URLs. Include the path
directly in the endpoint name for clarity.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Organize endpoints into logical groups: Website, Arcade, Worksheets, Flowcharts, Core API, Infrastructure
- Add hide-url: false to show actual URLs on status page
- Use user-friendly names like "Games Hub", "Worksheet Builder", "Flashcard Generator"
- Remove confusing internal service endpoints
- Check database and Redis via infrastructure group
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update architecture diagram to show NAS Traefik as entry point
- Add "Adding New Subdomains" guide with DNS, NAS Traefik, and k3s steps
- Document network architecture in CLAUDE.md for agents
- Note services.yaml location on NAS
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Gatus deployment monitoring homepage, health API, Redis, DB
- Simplified ingress (HTTP via NAS Traefik handles SSL)
- Updated NAS Traefik services.yaml with status subdomain routes
Access: https://status.abaci.one
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Gatus deployment with SQLite persistence
- ConfigMap with endpoint monitors (homepage, health API, Redis, DB)
- Ingress with SSL via cert-manager
- DNS CNAME record already configured
Deploy with: terraform apply
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Keel helm release that polls ghcr.io every 2 minutes
- Add keel.sh annotations to app StatefulSet for auto-updates
- Create comprehensive README.md documenting k3s architecture
- Update CLAUDE.md with automatic deployment workflow
After terraform apply, deployments are fully automatic:
push to main → build → Keel detects new image → rolling update
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Pod-0 remains LiteFS primary (handles writes), pod-1 and pod-2 are
replicas that serve reads and forward writes to primary.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove legacy schema-specific formatting fallbacks in formatting.ts and example-generator.ts
- All flowcharts now require explicit display.problem and display.answer expressions
- Add DISP-003 diagnostic for missing display.problem expressions
- Update doctor to treat missing display.answer as error (was warning)
Also includes:
- Terraform: generate LiteFS config at runtime, add AUTH_TRUST_HOST, add volume mounts for vision-training and uploads data
- Terraform: add storage.tf for persistent volume claims
- Add Claude instructions for terraform directory
- Various UI component formatting updates
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
LiteFS needs the actual pod hostname for cluster communication,
but HOSTNAME=0.0.0.0 was being set in both the Dockerfile and
ConfigMap, overriding the pod's hostname.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add LiteFS binary and config to Docker image for SQLite replication
- Convert k8s Deployment to StatefulSet for stable pod identities
- Pod-0 is primary (handles writes), others are replicas
- LiteFS proxy forwards write requests to primary automatically
- Add headless service for pod-to-pod communication
- Increase Node.js heap size to 4GB for Next.js build
- Exclude large Python venvs from Docker context
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Terraform now deploys a complete k8s environment:
- cert-manager with Let's Encrypt (staging + prod issuers)
- Redis deployment with persistent storage
- App deployment (2 replicas, rolling updates)
- Traefik ingress with SSL, HSTS, HTTP→HTTPS redirect
Ready for switchover by forwarding ports 80/443 to k3s VM.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Set up Terraform to manage k3s resources on the NAS VM:
- Kubernetes and Helm providers configured
- Created 'abaci' namespace for workloads
- Ready for BullMQ workers and future scalable services
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>