Announcing SharedLLM v0.1.0 — public alpha
SharedLLM is a global, community-owned AI compute mesh. Instead of paying a hyperscaler the 50%+ margin that's become standard on proprietary inference APIs, contributors pool idle hardware — a MacBook at home, a gaming PC, a dusty workstation — and run frontier open models across the mesh. The coordinator routes requests, the nodes do the work, and the economics are at-cost.
Today we're cutting v0.1.0, the first public alpha. It is deliberately small in scope: the goal of this release is to let anyone audit the code, run the full system locally in three commands, and cross-verify the distributed inference path against two real machines.
What's in the release
Pre-built llama.cpp RPC binaries for four platforms, built locally and smoke-tested:
rpc-binaries-darwin-arm64.tar.gz— macOS Apple Silicon, Metal + RPC, dylibs relocated viainstall_name_toolrpc-binaries-linux-arm64.tar.gz— Linux ARM64 (Raspberry Pi 5, Ampere, Graviton)rpc-binaries-linux-x86_64.tar.gz— Linux x86_64rpc-binaries-windows-x86_64.zip— Windows x86_64 via mingw-w64 cross-compilesharedllm-0.1.0-py3-none-any.whl+ sdist — the Python coordinator and node daemon
All artifacts are signed with SHA256SUMS and published on the GitHub release page.
What actually works
- Coordinator + worker handshake with HMAC-SHA256 cluster secret. Workers can't register with a bad token; the coordinator rejects them at the boundary.
- Cross-container distributed inference via llama.cpp's GGML_RPC backend. Our
deploy/docker-compose.test.ymlbrings up three containers — coordinator,rpc-serverworker, andllama-serverprimary — wired together on the Docker bridge. The primary's tensors live in the worker's memory; inference requests traverse TCP between namespaces. This is the same code path that runs across two physical machines. - Token-bucket rate limiting with per-user and shared anonymous buckets. Under a 100-user loadtest on a MacBook, the coordinator holds ~580 req/s with p99 = 803 ms and sheds the rest cleanly.
- Trust registry with EMA scoring and Jaccard token overlap, so a single misbehaving worker can't poison results.
- Kudos priority queue with
log1pscoring + aging, so contributors who give more get higher priority without starving newcomers.
What doesn't work yet
Alpha means alpha. Specifically:
- No public coordinator. We have a reference deployment but no SLA, no signups, no hosted API. You run your own.
- Only the tiny
stories260Ktest model is verified end-to-end cross-container. Larger quantized models hit a known upstreamllama.cppRPC assertion on certain tensor shapes — a 4-bit row count that isn't a multiple of 512. Upstream has a fix in flight; we'll track it in the next release. - Progressive model download is still a single-file blocker. The node daemon's
model_pullcommand probes disk space and refuses to start without headroom, but there's no resumable chunked fetch yet. Expect a 5+ GB model to tie up the worker for several minutes on first run. - State is not durable across releases. Expect to wipe
~/.sharedllmwhen you upgrade.
Try it in five minutes
Path 1 — the Docker integration stack — is the fastest way to see the whole thing end-to-end:
git clone https://github.com/MHASK/sharedllm.git
cd sharedllm
docker build -f Dockerfile.llama -t sharedllm:llama .
docker compose -f deploy/docker-compose.test.yml up -d
Wait ~30 seconds for the test model to download, then verify:
curl http://127.0.0.1:18420/health
# → {"status":"ok"}
curl -s http://127.0.0.1:18000/completion \
-H 'Content-Type: application/json' \
-d '{"prompt":"Once upon a time","n_predict":32}' \
| python -m json.tool
You should see ~32 tokens of generated text and RPC[rpc-worker:50052] lines in the primary container's logs. That's the RPC handshake talking to a different container over the bridge network — the exact same code path as two machines over the open internet.
Full instructions (including a from-source venv path and a single-command installer for node operators) are in docs/install.md.
Why AGPL-3.0-or-later
The AGPL's network-use clause is deliberate: anyone who runs a modified SharedLLM as a service must publish their source. This closes the SaaS loophole that has been used to enclose previously-open projects — the "open core" playbook where a company takes MIT/Apache code, wraps it in a hosted service, and never has to share improvements back. We don't want that.
Contributions are accepted under the Developer Certificate of Origin, not a CLA. Contributors retain copyright to their work, which means the project cannot be unilaterally relicensed — not by me, not by anyone. See GOVERNANCE.md for how the project is actually run.
What's next
- Track the upstream
llama.cppRPC fix so real quantized models work cross-container. - Progressive, resumable model downloads in the node daemon.
- A first public reference coordinator for people who just want to poke at the API without running anything.
- More platforms:
.dmgfor macOS and.msifor Windows, so non-developers can install without a terminal.
If any of this matters to you — if you think AI infrastructure should be owned by the people who use it — the best thing you can do today is star the repo, try the Docker stack, and file an issue when something breaks. We'll be here.