SharedLLM vs Petals vs Exo vs Kalavai: distributed LLM networks in 2026
A real disclosure up front: I'm the person building SharedLLM. If you want an unbiased comparison, close this tab and read someone else's. What I can promise is that everything I say about the other projects here is checkable against their public repos and websites, and everything I say about SharedLLM is checkable against its code. No vibes, no benchmarks I can't reproduce.
The question this post is trying to answer: in April 2026, if you want to run large language models across a mesh of consumer machines you actually own, which of these projects should you use?
What we're comparing
- Petals — the original, from BigScience. "BitTorrent-style" distributed LLM inference over a public peer-to-peer network.
- Exo Labs — Apple-Silicon-flavored mesh networking, famous for viral Twitter demos of Llama 3 405B split across iPhones and Macs.
- Kalavai — another volunteer-GPU project, more enterprise-leaning.
- SharedLLM — what I'm building. llama.cpp RPC backend, AGPL-3.0, non-profit governance, India-first contributor base.
I'm deliberately not including Ollama, LM Studio, GPUStack, or Together/Replicate. The first two are single-machine local stacks — they don't split models across multiple machines, so they're solving a different problem. The latter two are commercial hosted APIs — they're what community mesh projects are supposed to be an alternative to.
At-a-glance
| Petals | Exo | Kalavai | SharedLLM | |
|---|---|---|---|---|
| Backend | custom PyTorch | tinygrad / MLX | vLLM/ray | llama.cpp RPC |
| License | MIT | GPL-3.0 | Apache-2.0 | AGPL-3.0-or-later |
| Governance | research project | VC-backed startup | company-led | non-profit, DCO |
| Network model | public P2P swarm | LAN mesh | centralized pool | coordinator + mesh |
| Privacy claim | none | local-only | none explicit | route-sensitive |
| Largest verified model | Llama 70B | Llama 405B (demo) | varies | alpha — tiny test only |
| Production-ready | research | beta | beta | alpha |
Petals
The pitch: run big models at home, BitTorrent-style. Servers hold slices of a model, clients discover them, layers flow over the public network. It was a genuinely important research project from BigScience and it deserves credit for proving the pattern works.
What's good: mature codebase, actually supports real 70B-class models, Colab-friendly, good academic documentation. If you want to write a paper about distributed inference, this is the codebase to read.
What's not: the public network health widget on petals.dev is currently broken (reports "can't load network status"), which is a bad sign for a project whose whole pitch is a live public swarm. The backend is PyTorch + custom hivemind networking, not llama.cpp — which means you can't bring your own quantized GGUF and expect it to work, and you pay the full PyTorch memory overhead. It's MIT-licensed, which is fine until someone wraps it in a hosted service and doesn't share the improvements back — which has happened to projects with this license posture before.
Use it if: you're a researcher, you care about the academic lineage, and you're OK with a project that's in maintenance mode.
Exo Labs
The pitch: "one machine is not enough." Exo went viral in 2024 with a video of Llama 3.1 405B running across two MacBook Pros and an iPhone. The vibes are very good.
What's good: the demos are real, the Apple Silicon focus is well-chosen (MLX is fast, unified memory is a big deal), the GitHub star count is legitimately impressive, and they've pulled off the rare feat of making distributed inference feel cool.
What's not: their website is literally just a logo, a download button, and two embedded videos — no copy, no value proposition, no docs visible from the homepage. That's either a deliberate "vibes over text" play or a sign that they don't know how to convert visitors, and I genuinely can't tell which. They're VC-backed, which in the distributed-compute space historically ends one of two ways: either the venture case closes (the company IPOs a hosted version and the OSS becomes a trailer) or it doesn't (the company pivots and the OSS gets orphaned). Neither is good for a mesh that's supposed to outlast its current maintainers.
Use it if: you have a fleet of Apple Silicon Macs on the same LAN, you want the smoothest demo-grade experience, and you're comfortable with the governance risk.
Kalavai
The pitch: enterprise-ish volunteer-GPU pooling, Apache-licensed, commercially friendly.
What's good: Apache-2.0 is the friendliest license for integrators, and the Ray/vLLM backend choice means it slots into existing MLOps tooling easily.
What's not: at the time of writing, kalavai.net returns a single word — just "kalavai" — with no homepage content. That's either a domain transition or an abandoned marketing site, and again I can't tell which. The GitHub repo is active but the "community" framing is thinner than it looks; this is closer to "company pool with volunteer contributors" than "community-owned mesh."
Use it if: you're an integrator building on top of Ray and you need Apache-compatible licensing.
SharedLLM
The pitch: community-owned, privacy-first, AGPL-3.0, built on llama.cpp's RPC backend so you can bring any GGUF and keep your existing quant workflow. Non-profit governance under the Developer Certificate of Origin (no CLA). India-first contributor base.
What's good: the llama.cpp backend choice is load-bearing — it means every quantization format, every supported architecture, and every tokenizer that llama.cpp already handles just works, without us maintaining a parallel model zoo. The AGPL-3.0 network-use clause deliberately closes the SaaS loophole: anyone running a modified SharedLLM as a hosted service must publish their source. The DCO-not-CLA stance means the project can't be unilaterally relicensed, not even by me.
What's not: this is v0.1.0 alpha. Only the tiny stories260K test model is verified end-to-end cross-container; larger quantized models hit a known upstream llama.cpp RPC assertion on certain tensor shapes (tracked, with a fix in flight upstream). There's no public reference coordinator yet, which means you can't "try it in the cloud" — you run your own. Progressive resumable model downloads aren't in yet. If you need a production-grade mesh today, SharedLLM is not that project.
Use it if: you care about the governance, you want the llama.cpp backend's broad model support, and you're OK being an early contributor to something small rather than a customer of something polished.
What actually matters (and what doesn't)
Here's what I've come to believe after spending a year building one of these things:
- Backend maturity dominates everything else. The reason SharedLLM chose llama.cpp RPC isn't ideological. It's that llama.cpp has the broadest model coverage, the most active contributor base, and the best quantization formats. Fighting that ecosystem by building a parallel one is a tax you pay forever.
- Governance matters more than it looks. The MIT-vs-Apache-vs-AGPL debate sounds theological until your favorite MIT-licensed project gets forked into a proprietary hosted service and the original maintainers can't do anything about it. AGPL isn't perfect — it's annoying for some integrators — but it solves the exact failure mode distributed-compute projects keep hitting.
- "Public swarm" is a trap in 2026. Petals' approach — a public P2P network where anyone can be a server and anyone can be a client — runs into real abuse problems the moment it gets popular. SharedLLM's coordinator model is less elegant but more defensible: someone has to vouch for new workers (via HMAC cluster secret), someone owns the trust registry, someone can rate-limit a misbehaving client. Decentralization purity isn't worth a network that can't defend itself.
- Demos are not infrastructure. Exo's 405B demo is genuinely impressive. But the gap between "this worked on a LAN with three Macs we owned" and "this is a network you can rely on" is enormous, and it's the gap where most of the engineering actually lives: worker churn, model download resumption, trust scoring, rate limiting, coordinator durability, upgrade paths. Nobody's demos show this, because it's boring.
What I'd actually recommend
- You want to run 70B today across your own Macs on your LAN: use Exo. It's the most polished path for that exact workload. Accept the governance risk.
- You want to run a research experiment on a public distributed network: use Petals. It's the most cited and the most mature.
- You want to build on top of a community-owned, AGPL, non-profit mesh that will still exist in five years: SharedLLM — but understand you're joining a project in alpha, not buying a service. Come back in a year and ask me again.
- You want one local machine running open models: none of these. Use Ollama or LM Studio. They're solving a different problem and they're both excellent at it.
If you've read this far and you think the governance question actually matters, the best way to back SharedLLM today is to star the repo, try the cross-machine RPC tutorial, and file an issue when something breaks. That's how small projects become real ones.