Cascadia×LiteLLM
integration · LiteLLM + Cascadia

One gateway in front of an on-prem mesh.

The same agentic demos, routed through LiteLLM — the open-source, self-hostable, OpenAI-compatible gateway — which meters and budgets every call before it reaches a Cascadia mesh of Intel AI PCs. Cascadia is wired in config-only (each model an openai/<id> route); nothing about the gateway is Cascadia-specific.

clientBrowser

The four agentic-workflow demos. Orchestrates the pipeline step-by-step and renders per-step node IDs + signed receipts. Holds no secrets.

step UISSE token streamaudit view
fetch /api/* (same-origin)
edgeVercel — Next.js + /api/* routes

Serverless functions run every LLM call server-side, so the LiteLLM virtual key never reaches the browser. This is the only thing deployed to the cloud.

server-held virtual keySSE relay/api/usage · /api/gateway
OpenAI /v1 · Bearer <virtual key> · model = cascadia/<id>
gatewayLiteLLM proxy· OpenAI-compatible gateway + control plane

Config-only integration: each cascadia/<id> alias is an openai/<id> route at the coordinator — no provider code. Authenticates the virtual key, enforces its dollar budget, and prices + meters every call to a Postgres spend ledger, then passes the response through unchanged — including the cascadia receipt block. Runs self-hosted on the miner, exposed via Tailscale Funnel.

virtual keysbudget enforcementspend tracking (Postgres)config-only routing
openai/<id> · Bearer <partner token> → coordinator /v1
meshCascadia meshCascadia mesh· Community Labs · on-prem inference

Open-weight 8B-class models split into INT4 OpenVINO shards across a fleet of Intel AI PCs (one model pipeline-parallel across two machines). Every response carries the serving node ID and a signed receipt. Zero bytes leave the premises.

INT4 OpenVINO shardsdistributed fleetsigned receipts

The data plane stops at the mesh — no cloud GPU is touched at any layer. Swap the mesh for any OpenAI-compatible backend and the gateway, keys, budgets, and this UI are unchanged.