roadmap

One service for the whole local AI stack.

v0.1.0-alpha ships the OpenAI-compatible inference layer: slots, dispatcher, hardware probe, bundled chat, signed self-update, Proxmox host-pressure segment. v0.2 takes the same lifecycle discipline and extends it into the rest of the local AI stack: agents, persistent memory, MCP, tools, image, voice, third-party extensions. The goal is one place on your hardware that runs the whole stack, instead of seven daemons glued together.

No dates below. Items are direction. The closer a thing is to the left column, the closer it is to running on your box.

Shipped

OpenAI-compatible /v1/* API

shipped

Chat, completions, embeddings, rerank, transcriptions, speech, images. Every OpenAI SDK works unchanged against the local box.

Slot lifecycle state machine

shipped

Real atomic transitions (offline → pulling → starting → warming → ready → serving), persisted and SSE-streamed so the dashboard reflects reality, not snapshots.

Hardware-aware probe

shipped

Detects GPU, NPU, and unified memory; surfaces VRAM/RAM fit warnings inline before you load a model that won’t fit.

Bundled OpenWebUI

shipped

Prewired chat on :3001 pointed at the local API. Zero config; the installer wires it for you.

Cosign-signed self-update

shipped

Atomic version swap with rollback. Stable and nightly channels, GitHub OIDC-verified release tarballs.

Five-provider stack

shipped

llama.cpp (Vulkan / ROCm) for chat and embed, FLM for NPU multiplex, Moonshine for STT, Kokoro for TTS, ComfyUI for image generation. All first-class in v1.

Image generation

shipped

POST /v1/images/generations served by a ComfyUI provider on ROCm. Curated SDXL Turbo / SD 1.5 / Flux Schnell with license badges; the img slot runs inside the same lifecycle as chat, embed, and audio.

Caddy + auth + HTTPS + mDNS

shipped

install.sh --auth=basic brings up Caddy at the edge with basic_auth, Bearer tokens for the OpenAI API, automatic HTTPS (internal CA for .local, Let’s Encrypt for real domains), and a best-effort Avahi service file so hal0.local resolves on the LAN. Off by default; opt in when you’re ready to expose the box.

Installer overhaul

shipped

ASCII banner + step counter, extracted preflight.sh (disk + ports + python + systemd), hal0 doctor for re-running it post-install, hardware cards + auto-populated slots/primary.toml, contextual ERR-trap recovery hints, and a live-hello + QR + reachability finish.

Capability slots overlay

shipped

Embed, Voice, and Img capability cards plus an NPU backend rollup in the dashboard, backed by /etc/hal0/capabilities.toml and GET|POST /api/capabilities/*. The orchestrator picks one model per child, validates (backend, model) pairs against the catalog, and reconciles drift on every apply.

FLM NPU provider live

shipped

Self-contained ghcr.io/hal0ai/hal0-toolbox-flm:v1 image bundles FLM; chat and embed surfaced in the picker only when XDNA hardware and the local toolbox image are both present. Model namespace comes from flm list -j; STT slice deferred.

Proxmox host-pressure segment

shipped

Drop a read-only PVEAuditor API token into Settings → Proxmox integration and the dashboard memory bar shows the physical DIMM total plus a muted Proxmox host segment for other-tenant + ZFS ARC + kernel pressure. Token sits 0600 at /etc/hal0/proxmox.json and the API redacts it on read; bare-metal installs leave the panel off.

embed-rerank built-in slot

shipped

The capability orchestrator auto-creates an embed-rerank slot on first enable: bge-reranker-v2-m3-q4_k_m on port 8086 with --reranking, so /v1/rerankings stays separate from the chat surface on :8081.

First-run wizard rewrite

shipped

Eight linear steps from password through hardware, primary model, capabilities, HF token, license, install, done. HF-token step is conditional on a gated model selection; legacy 5-step picker and three IA prototypes deleted.

--models-dir install flag

shipped

install.sh --models-dir=PATH (or HAL0_MODELS_DIR, or an interactive tty prompt) points a fresh install at an existing model store. Persisted as [models].pull_root and included in [models].roots so the on-disk tree scans on first boot.

Models scan preview overhaul

shipped

POST /api/models/scan/preview is inspection-only; kind-driven gating (llama / moonshine / kokoro / flm), shared skip rules, per-row select, dedup across HF blob-cache symlinks, GGUF magic-byte detection regardless of extension, GGUF general.name → suggested name, HF-cache repo-name fallback.

Per-slot live metrics

shipped

GET /api/slots/metrics reads docker cgroup memory + ActiveEnterTimestampMonotonic + scraped llama-server /metrics, with a fallback to the systemd unit’s MemoryCurrent for native-host slots.

Orchestrator drift reconcile

shipped

CapabilityOrchestrator.apply() now reconciles capabilities.toml ↔ slots/*.toml on every call rather than only on selection diff, so prior failed applies, manual edits, and install-time seeds can no longer leave the two disagreeing.

Soon

v0.2

Hugging Face model pulls

in-progress

POST /api/models/{id}/pull (streaming HF download) is still 501 today; the picker + first-run wizard wire it for the next 0.1.x alpha. (FLM-aware pull for NPU tags landed in PR #89, so Ollama-style family:size ids already pull from the dashboard.)

Agent management & integration

in-progress

First-class agents stored on hal0 with conversation orchestration and deterministic tool-use loops. A place for agents to live on your box, not only to chat with.

Unified memory system

in-progress

Persistent, federated memory across local slots and (optionally) external models. Mem0-style API with sources, search, and scoping.

MCP support, host and server

in-progress

hal0 speaks Model Context Protocol both directions. Compose tools across local slots and external MCP services; discover them from the dashboard.

Extensions framework

in-progress

Third-party apps packaged as hal0 extensions: systemd unit + dashboard tile + healthcheck. Bring-your-own-app, lifecycle-managed.

Benchmarks & presets UI

in-progress

In-dashboard tok/s + latency runs, plus curated loadout presets you can flash onto a fresh install.

AUR PKGBUILD & Ubuntu PPA

in-progress

Native distro packages on top of the install script: pacman and apt as first-class install paths.

Exploring

post-v1.0

Multi-host federation

later

A slot mesh across LAN boxes: primary on the Strix Halo, embed on the workstation, all behind one /v1/* surface.

Fine-tune & LoRA hot-swap

later

Attach and rotate LoRAs against a warm base model without unloading the underlying weights.

Per-model rate limits & budgets

later

Cost-style accounting for local inference, so a chatty agent can be capped without taking the whole box down.

Voice mode end-to-end

later

Moonshine + agent loop + Kokoro stitched into a hands-free streaming conversation, gated through the slot lifecycle.

ChatOps adapters

later

Slack and Matrix bridges as extensions, so you can talk to hal0 from the rooms you already live in.

Exploring items are bets we believe in, not promises. Scope and order will shift as v0.2 lands and we learn what people actually do with a local platform.

shape it

The roadmap is a conversation.

File issues, open discussions, send patches. hal0 is Apache-2.0; the platform is yours to extend, fork, or steer. The agents / memory / MCP track in particular benefits from real-world use cases more than from our guesses.

Open an issue Browse the repo