Commit Graph

8 Commits

Author SHA1 Message Date
Thomas Hallock 2082710ab2 fix: add retry middleware for zero-downtime deployments
The problem: During deployments, users pinned via sticky session to
the restarting container experienced ~60s of downtime because:
1. Health checks were too slow (10s interval)
2. No retry on failure - requests just failed

The fix:
- Add retry middleware: 3 attempts with 100ms initial interval
- Reduce health check interval from 10s to 3s
- Add health check timeout of 2s

Now when your pinned server restarts:
1. Request fails
2. Traefik retries on the OTHER healthy server
3. You get a response (maybe with new server_id cookie)

Combined with Redis for session state, this should give true
zero-downtime deployments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 05:25:19 -06:00
Thomas Hallock 57781a9ecc feat: enhance deployment info with health checks and refactor keypad
- Add useDeploymentInfo hook with live health/build info fetching
- Refactor DeploymentInfoContent with server health status, WebSocket
  connectivity, and database status displays
- Add Storybook stories and tests for DeploymentInfoContent
- Extract NumericKeypad styles to CSS file and config to separate module
- Add debug page index
- Update NAS deployment configs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 17:22:52 -06:00
Thomas Hallock 4fbdb3fe50 feat(debug): add debugging tools for cross-instance issues
1. Enhanced /api/build-info endpoint:
   - Shows instance hostname and container ID
   - Shows Redis connection status
   - Shows Socket.IO adapter type (redis/memory)

2. Instance-specific subdomain routes:
   - blue.abaci.one routes to blue container only
   - green.abaci.one routes to green container only
   - Useful for testing cross-instance communication

3. Socket.IO debug page (/debug/socket):
   - Shows connection status and socket ID
   - Join/leave rooms (remote-camera, arcade, game)
   - Send custom events with JSON data
   - Real-time event log with direction arrows

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 11:42:54 -06:00
Thomas Hallock 2e77b46ca1 fix(deploy): remove depends_on from blue/green compose files
compose-updater can't resolve depends_on references to services
defined only in the main docker-compose.yaml. Remove depends_on
and rely on REDIS_URL environment variable instead.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 11:39:32 -06:00
Thomas Hallock 0346455b3e feat(remote-camera): add Redis for cross-instance session sharing
Production blue/green deployment caused remote camera to fail because
desktop and phone could hit different instances with separate in-memory
session storage and Socket.IO rooms.

Changes:
- Add Redis service to docker-compose (production only)
- Create Redis client utility with optional connection
- Update session manager to use Redis when REDIS_URL is set
- Add Socket.IO Redis adapter for cross-instance room broadcasts
- Convert session manager functions to async
- Update tests for async functions

In development (no REDIS_URL), falls back to in-memory storage.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 11:18:22 -06:00
Thomas Hallock 8e4338bbbe fix(deploy): add sticky sessions for Socket.IO and remote camera
Remote camera sessions are stored in-memory per instance. Without sticky
sessions, Traefik could route desktop to Blue and phone to Green, causing
"session expired" errors and failed connections.

Sticky sessions ensure the same client always hits the same backend instance,
which is required for:
- Socket.IO connections (rooms are per-instance)
- Remote camera session state (in-memory Map)
- Any stateful WebSocket communication

Note: Sessions will still be lost on container restart/deployment. For full
robustness, sessions should be persisted to database and Socket.IO should
use Redis adapter. This is a workaround for the immediate issue.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 11:08:32 -06:00
Thomas Hallock e703e90875 chore: cleanup unused imports and apply formatting
- Remove unused `and` import from VisionRecorder.ts
- Remove unused `IncomingMessage` and `ws` imports from socket-server.ts
- Add `muted` attribute to video element in ProblemVideoPlayer
- Apply code formatting across vision and practice components
- Update documentation formatting in DEPLOYMENT.md and README

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 18:49:31 -06:00
Thomas Hallock b47992f770 feat(deploy): add blue-green deployment with health endpoint
- Add /api/health endpoint that checks database connectivity
- Set up blue-green deployment with two containers (abaci-blue, abaci-green)
- Add docker-compose.yaml with YAML anchors for DRY config
- Add generate-compose.sh to create blue/green compose files from main
- Update deploy.sh with NAS-specific fixes (scp -O, PATH for docker)
- Fix deploy.sh to not overwrite production .env by default

The blue-green setup allows zero-downtime deployments via compose-updater,
which watches separate compose files and restarts containers independently.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 17:04:01 -06:00