2026-04-11 · Release

Announcing SharedLLM v0.1.0 — public alpha

SharedLLM is a global, community-owned AI compute mesh. Instead of paying a hyperscaler the 50%+ margin that's become standard on proprietary inference APIs, contributors pool idle hardware — a MacBook at home, a gaming PC, a dusty workstation — and run frontier open models across the mesh. The coordinator routes requests, the nodes do the work, and the economics are at-cost.

Today we're cutting v0.1.0, the first public alpha. It is deliberately small in scope: the goal of this release is to let anyone audit the code, run the full system locally in three commands, and cross-verify the distributed inference path against two real machines.

What's in the release

Pre-built llama.cpp RPC binaries for four platforms, built locally and smoke-tested:

rpc-binaries-darwin-arm64.tar.gz — macOS Apple Silicon, Metal + RPC, dylibs relocated via install_name_tool
rpc-binaries-linux-arm64.tar.gz — Linux ARM64 (Raspberry Pi 5, Ampere, Graviton)
rpc-binaries-linux-x86_64.tar.gz — Linux x86_64
rpc-binaries-windows-x86_64.zip — Windows x86_64 via mingw-w64 cross-compile
sharedllm-0.1.0-py3-none-any.whl + sdist — the Python coordinator and node daemon

All artifacts are signed with SHA256SUMS and published on the GitHub release page.

What actually works

Coordinator + worker handshake with HMAC-SHA256 cluster secret. Workers can't register with a bad token; the coordinator rejects them at the boundary.
Cross-container distributed inference via llama.cpp's GGML_RPC backend. Our deploy/docker-compose.test.yml brings up three containers — coordinator, rpc-server worker, and llama-server primary — wired together on the Docker bridge. The primary's tensors live in the worker's memory; inference requests traverse TCP between namespaces. This is the same code path that runs across two physical machines.
Token-bucket rate limiting with per-user and shared anonymous buckets. Under a 100-user loadtest on a MacBook, the coordinator holds ~580 req/s with p99 = 803 ms and sheds the rest cleanly.
Trust registry with EMA scoring and Jaccard token overlap, so a single misbehaving worker can't poison results.
Kudos priority queue with log1p scoring + aging, so contributors who give more get higher priority without starving newcomers.

What doesn't work yet

Alpha means alpha. Specifically:

No public coordinator. We have a reference deployment but no SLA, no signups, no hosted API. You run your own.
Only the tiny stories260K test model is verified end-to-end cross-container. Larger quantized models hit a known upstream llama.cpp RPC assertion on certain tensor shapes — a 4-bit row count that isn't a multiple of 512. Upstream has a fix in flight; we'll track it in the next release.
Progressive model download is still a single-file blocker. The node daemon's model_pull command probes disk space and refuses to start without headroom, but there's no resumable chunked fetch yet. Expect a 5+ GB model to tie up the worker for several minutes on first run.
State is not durable across releases. Expect to wipe ~/.sharedllm when you upgrade.

Try it in five minutes

Path 1 — the Docker integration stack — is the fastest way to see the whole thing end-to-end:

git clone https://github.com/MHASK/sharedllm.git
cd sharedllm
docker build -f Dockerfile.llama -t sharedllm:llama .
docker compose -f deploy/docker-compose.test.yml up -d

Wait ~30 seconds for the test model to download, then verify:

curl http://127.0.0.1:18420/health
# → {"status":"ok"}

curl -s http://127.0.0.1:18000/completion \
  -H 'Content-Type: application/json' \
  -d '{"prompt":"Once upon a time","n_predict":32}' \
  | python -m json.tool

You should see ~32 tokens of generated text and RPC[rpc-worker:50052] lines in the primary container's logs. That's the RPC handshake talking to a different container over the bridge network — the exact same code path as two machines over the open internet.

Full instructions (including a from-source venv path and a single-command installer for node operators) are in docs/install.md.

Why AGPL-3.0-or-later

The AGPL's network-use clause is deliberate: anyone who runs a modified SharedLLM as a service must publish their source. This closes the SaaS loophole that has been used to enclose previously-open projects — the "open core" playbook where a company takes MIT/Apache code, wraps it in a hosted service, and never has to share improvements back. We don't want that.

Contributions are accepted under the Developer Certificate of Origin, not a CLA. Contributors retain copyright to their work, which means the project cannot be unilaterally relicensed — not by me, not by anyone. See GOVERNANCE.md for how the project is actually run.

What's next

Track the upstream llama.cpp RPC fix so real quantized models work cross-container.
Progressive, resumable model downloads in the node daemon.
A first public reference coordinator for people who just want to poke at the API without running anything.
More platforms: .dmg for macOS and .msi for Windows, so non-developers can install without a terminal.

If any of this matters to you — if you think AI infrastructure should be owned by the people who use it — the best thing you can do today is star the repo, try the Docker stack, and file an issue when something breaks. We'll be here.

Next: Splitting Llama across two MacBook Pros with llama.cpp RPC →