One gateway in front of an on-prem mesh.
The same agentic demos, routed through LiteLLM — the open-source, self-hostable, OpenAI-compatible gateway — which meters and budgets every call before it reaches a Cascadia mesh of Intel AI PCs. Cascadia is wired in config-only (each model an openai/<id> route); nothing about the gateway is Cascadia-specific.
The four agentic-workflow demos. Orchestrates the pipeline step-by-step and renders per-step node IDs + signed receipts. Holds no secrets.
Serverless functions run every LLM call server-side, so the LiteLLM virtual key never reaches the browser. This is the only thing deployed to the cloud.
Config-only integration: each cascadia/<id> alias is an openai/<id> route at the coordinator — no provider code. Authenticates the virtual key, enforces its dollar budget, and prices + meters every call to a Postgres spend ledger, then passes the response through unchanged — including the cascadia receipt block. Runs self-hosted on the miner, exposed via Tailscale Funnel.
Open-weight 8B-class models split into INT4 OpenVINO shards across a fleet of Intel AI PCs (one model pipeline-parallel across two machines). Every response carries the serving node ID and a signed receipt. Zero bytes leave the premises.
The data plane stops at the mesh — no cloud GPU is touched at any layer. Swap the mesh for any OpenAI-compatible backend and the gateway, keys, budgets, and this UI are unchanged.