Deploy System

Technical deep-dive into Dictumal's production MVP streaming deployment path.

Current Architecture

Every deployment creates one DigitalOcean Ubuntu droplet and configures a browser-streamed Linux desktop via cloud-init.

  • Cloud provider: DigitalOcean Droplets via src/lib/deploy/digitalocean.ts.
  • Bootstrap: src/lib/deploy/cloud-init.ts installs XFCE, TigerVNC, Docker, Guacamole, Nginx, and UFW.
  • Desktop runtime: TigerVNC on display :1 (TCP 5901), plus XFCE startup service.
  • Gateway: Guacamole containers (guacd + guacamole) with JSON-auth token launch.
  • HTTPS: Nginx reverse proxy on 443 with a self-signed certificate and HTTP->HTTPS redirect.
  • Firewall: UFW allows 22, 80, 443; everything else is denied.

End-To-End Request Flow

  1. User starts deployment from the constitution UI (src/components/constitution/deploy-dialog.tsx).
  2. Frontend runs a 3-step wizard: Vigilance -> Machine -> Pre-installed Apps.
  3. Frontend calls POST /api/deployments with { constitutionId, region?, size?, vigilance? } (including Vigilance options like screenshots, local model tag, and app preinstalls when selected).
  4. API generates an 8-char VNC password, derives the Guacamole JSON key, generates cloud-init, and calls DO droplet create.
  5. Deployment row is persisted in Prisma with status PROVISIONING.
  6. Client polls GET /api/deployments/[id] every 5s.
  7. Status route transitions through PROVISIONING -> CONFIGURING -> ACTIVE based on droplet state and readiness probes.
  8. Active launch uses GET /api/deployments/[id]/launch (or ?target=ip) which returns a 302 redirect to Guacamole with encrypted short-lived token data.

Deploy Wizard (Current UI)

  • Step 1: Vigilance - enable/disable Coding Practice, choose the local clipboard classifier model, and optionally save session screenshots for testing.
  • Step 2: Machine - select VM size and region. If Vigilance is enabled, the UI simplifies this step to a fixed Performance machine (s-4vcpu-8gb) and only asks for region.
  • Step 3: Pre-installed Apps - choose apps to install during cloud-init (currently VS Code), review a short summary, and deploy.
  • Regions are currently US-only: sfo3, nyc1, and nyc3 (enforced in both the UI and API). Default is sfo3.

Deployment State Machine

PROVISIONING -> CONFIGURING -> ACTIVE
PROVISIONING/CONFIGURING -> ERROR
Any status -> DESTROYED (on DELETE)

Vigilance (When Enabled)

Deployments created with Vigilance install a local dictumal-vigilance-agent daemon in the VM. The daemon runs as a systemd service and manages Coding Practice sessions automatically.

  • Session starts when the browser desktop session connects (detected via the local VNC connection on 5901).
  • Session stops automatically after desktop disconnect, with a short grace period to avoid reconnect flapping.
  • Screenshots are captured for local OCR/vision analysis and only persisted when the deploy-time screenshot testing toggle is enabled.
  • The selected local clipboard classifier model is passed into the VM config and preloaded through an Ollama background service during setup.
  • VS Code can be preinstalled as part of the deployment flow and is shown as a selected setup task in both the deploy dialog and instances progress UI.
  • Manual dictumal-vigilance commands still exist for debugging, but normal users should not need to start/stop sessions themselves.

Current default clipboard classifier model: qwen3:0.6b (lightweight, CPU-friendly). The agent falls back to local heuristics if the local model runtime is unavailable or still loading.

Deploy dialog failures are surfaced inline to the user (for example quota/region/provider errors) instead of failing silently.

Readiness Probes During CONFIGURING

GET /api/deployments/[id] computes progress from these checks against deployment IP:

  • TCP 5901 (VNC socket)
  • HTTP http://[ip]/
  • HTTP http://[ip]:8080/guacamole/
  • HTTPS https://[ip]/guacamole/ with insecure cert probe

Deployment is promoted to ACTIVE only when HTTPS Guacamole is reachable. A single false VNC check does not block activation.

Required Environment Variables

VariableRequiredUsed ByBehavior
DIGITALOCEAN_API_TOKENYes (for deploy)src/lib/deploy/digitalocean.tsAuthenticates DO API calls for create/get/delete droplet.
DIGITALOCEAN_IMAGE_SLUGNosrc/lib/deploy/digitalocean.tsExplicit image override. If set, latest-image discovery is skipped.
DIGITALOCEAN_USE_LATEST_UBUNTU_IMAGENosrc/lib/deploy/digitalocean.tsIf truthy, discovers newest Ubuntu image slug from DO catalog; otherwise defaults to ubuntu-24-04-x64.
DIGITALOCEAN_SSH_KEYSNosrc/lib/deploy/digitalocean.tsComma-separated SSH key IDs/fingerprints attached to droplets.
NEXT_PUBLIC_STREAM_GATEWAY_HOSTNo/api/deployments/[id]/launch, deployment UI, instances UIPreferred hostname for launch redirects. Defaults to stream.the-next-lab.com; falls back to IP when blank or when ?target=ip is used.

Provisioning Timeline And User Statuses

Typical timeline (varies by region/image cache/network):

  • 0-2 min: PROVISIONING (DigitalOcean droplet creation).
  • 2-10 min: CONFIGURING (cloud-init package install + container startup + HTTPS gateway setup).
  • Ready: ACTIVE once HTTPS Guacamole responds.

Timeout rule: if CONFIGURING exceeds 20 minutes, API marks deployment ERROR and stores an error message pointing to /var/log/dictumal-init.log.

Progress stages emitted by GET /api/deployments/[id]:

StagePercentWhen Emitted
provisioning-droplet22%Waiting for provider droplet activation.
waiting-for-ip35%Droplet active but public IP missing.
installing-packages48%No service probes are up yet.
starting-guacamole62%VNC socket is reachable.
configuring-https78%Guacamole HTTP endpoint (:8080) responds.
final-health-check92%Guacamole HTTPS endpoint responds.
active100%Status promoted to ACTIVE.

UI behavior: deployment dialogs and instance progress components poll status every 5 seconds, so state/button changes can appear one poll cycle after backend state has changed.

Troubleshooting Playbook

1) Auth Configuration Failures

Symptoms:

  • Deployment endpoints return 401 Unauthorized.
  • Launch endpoint returns not found for a deployment the user expects to own.

Checks:

  • Confirm session exists (re-login).
  • Verify auth env vars are correct for current host (AUTH_URL/NEXTAUTH_URL, AUTH_SECRET/NEXTAUTH_SECRET, Google OAuth credentials).
  • Confirm OAuth callback URL matches the running environment.

2) VNC Connectivity Looks Broken

Symptoms:

  • Progress check shows vnc5901: false.
  • Deployment still advances to configuring-https or ACTIVE.

Interpretation and checks:

  • This can be expected. Port 5901 is not a hard gate to activation.
  • Validate stronger checks first: guacHttp8080 and guacHttps443.
  • On droplet, inspect /var/log/dictumal-init.log, systemctl status vncserver vnc-xstartup nginx, and Docker logs for Guacamole containers.

3) ACTIVE Button Refresh Feels Delayed

Symptoms:

  • Backend status is already ACTIVE.
  • UI still shows deploying state or active actions appear a few seconds later.

Why this happens:

  • Status polling interval is 5 seconds.
  • Server-rendered sections (like instances cards) update after client polling detects terminal state and triggers refresh.

Operator action: wait one extra poll cycle and re-open the deploy dialog or refresh the instances page if needed.

Security Notes And MVP Limitations

  • TLS is currently self-signed per droplet; clients may show certificate warnings.
  • Launch tokens are short-lived (5 minutes) but are URL query params, so treat redirected URLs as sensitive.
  • VNC password is stored with deployment metadata and used to derive Guacamole JSON key in current MVP design.
  • Cloud-init creates a default dictumal user/password pair for bootstrap convenience; hardening and credential rotation are pending.
  • Browser-stream readiness is the MVP success condition; deeper policy enforcement and stronger runtime hardening are deferred.

Related Pages