2026-04-11 · Comparison

SharedLLM vs Petals vs Exo vs Kalavai: distributed LLM networks in 2026

A real disclosure up front: I'm the person building SharedLLM. If you want an unbiased comparison, close this tab and read someone else's. What I can promise is that everything I say about the other projects here is checkable against their public repos and websites, and everything I say about SharedLLM is checkable against its code. No vibes, no benchmarks I can't reproduce.

The question this post is trying to answer: in April 2026, if you want to run large language models across a mesh of consumer machines you actually own, which of these projects should you use?

What we're comparing

I'm deliberately not including Ollama, LM Studio, GPUStack, or Together/Replicate. The first two are single-machine local stacks — they don't split models across multiple machines, so they're solving a different problem. The latter two are commercial hosted APIs — they're what community mesh projects are supposed to be an alternative to.

At-a-glance

Petals Exo Kalavai SharedLLM
Backendcustom PyTorchtinygrad / MLXvLLM/rayllama.cpp RPC
LicenseMITGPL-3.0Apache-2.0AGPL-3.0-or-later
Governanceresearch projectVC-backed startupcompany-lednon-profit, DCO
Network modelpublic P2P swarmLAN meshcentralized poolcoordinator + mesh
Privacy claimnonelocal-onlynone explicitroute-sensitive
Largest verified modelLlama 70BLlama 405B (demo)variesalpha — tiny test only
Production-readyresearchbetabetaalpha

Petals

The pitch: run big models at home, BitTorrent-style. Servers hold slices of a model, clients discover them, layers flow over the public network. It was a genuinely important research project from BigScience and it deserves credit for proving the pattern works.

What's good: mature codebase, actually supports real 70B-class models, Colab-friendly, good academic documentation. If you want to write a paper about distributed inference, this is the codebase to read.

What's not: the public network health widget on petals.dev is currently broken (reports "can't load network status"), which is a bad sign for a project whose whole pitch is a live public swarm. The backend is PyTorch + custom hivemind networking, not llama.cpp — which means you can't bring your own quantized GGUF and expect it to work, and you pay the full PyTorch memory overhead. It's MIT-licensed, which is fine until someone wraps it in a hosted service and doesn't share the improvements back — which has happened to projects with this license posture before.

Use it if: you're a researcher, you care about the academic lineage, and you're OK with a project that's in maintenance mode.

Exo Labs

The pitch: "one machine is not enough." Exo went viral in 2024 with a video of Llama 3.1 405B running across two MacBook Pros and an iPhone. The vibes are very good.

What's good: the demos are real, the Apple Silicon focus is well-chosen (MLX is fast, unified memory is a big deal), the GitHub star count is legitimately impressive, and they've pulled off the rare feat of making distributed inference feel cool.

What's not: their website is literally just a logo, a download button, and two embedded videos — no copy, no value proposition, no docs visible from the homepage. That's either a deliberate "vibes over text" play or a sign that they don't know how to convert visitors, and I genuinely can't tell which. They're VC-backed, which in the distributed-compute space historically ends one of two ways: either the venture case closes (the company IPOs a hosted version and the OSS becomes a trailer) or it doesn't (the company pivots and the OSS gets orphaned). Neither is good for a mesh that's supposed to outlast its current maintainers.

Use it if: you have a fleet of Apple Silicon Macs on the same LAN, you want the smoothest demo-grade experience, and you're comfortable with the governance risk.

Kalavai

The pitch: enterprise-ish volunteer-GPU pooling, Apache-licensed, commercially friendly.

What's good: Apache-2.0 is the friendliest license for integrators, and the Ray/vLLM backend choice means it slots into existing MLOps tooling easily.

What's not: at the time of writing, kalavai.net returns a single word — just "kalavai" — with no homepage content. That's either a domain transition or an abandoned marketing site, and again I can't tell which. The GitHub repo is active but the "community" framing is thinner than it looks; this is closer to "company pool with volunteer contributors" than "community-owned mesh."

Use it if: you're an integrator building on top of Ray and you need Apache-compatible licensing.

SharedLLM

The pitch: community-owned, privacy-first, AGPL-3.0, built on llama.cpp's RPC backend so you can bring any GGUF and keep your existing quant workflow. Non-profit governance under the Developer Certificate of Origin (no CLA). India-first contributor base.

What's good: the llama.cpp backend choice is load-bearing — it means every quantization format, every supported architecture, and every tokenizer that llama.cpp already handles just works, without us maintaining a parallel model zoo. The AGPL-3.0 network-use clause deliberately closes the SaaS loophole: anyone running a modified SharedLLM as a hosted service must publish their source. The DCO-not-CLA stance means the project can't be unilaterally relicensed, not even by me.

What's not: this is v0.1.0 alpha. Only the tiny stories260K test model is verified end-to-end cross-container; larger quantized models hit a known upstream llama.cpp RPC assertion on certain tensor shapes (tracked, with a fix in flight upstream). There's no public reference coordinator yet, which means you can't "try it in the cloud" — you run your own. Progressive resumable model downloads aren't in yet. If you need a production-grade mesh today, SharedLLM is not that project.

Use it if: you care about the governance, you want the llama.cpp backend's broad model support, and you're OK being an early contributor to something small rather than a customer of something polished.

What actually matters (and what doesn't)

Here's what I've come to believe after spending a year building one of these things:

  1. Backend maturity dominates everything else. The reason SharedLLM chose llama.cpp RPC isn't ideological. It's that llama.cpp has the broadest model coverage, the most active contributor base, and the best quantization formats. Fighting that ecosystem by building a parallel one is a tax you pay forever.
  2. Governance matters more than it looks. The MIT-vs-Apache-vs-AGPL debate sounds theological until your favorite MIT-licensed project gets forked into a proprietary hosted service and the original maintainers can't do anything about it. AGPL isn't perfect — it's annoying for some integrators — but it solves the exact failure mode distributed-compute projects keep hitting.
  3. "Public swarm" is a trap in 2026. Petals' approach — a public P2P network where anyone can be a server and anyone can be a client — runs into real abuse problems the moment it gets popular. SharedLLM's coordinator model is less elegant but more defensible: someone has to vouch for new workers (via HMAC cluster secret), someone owns the trust registry, someone can rate-limit a misbehaving client. Decentralization purity isn't worth a network that can't defend itself.
  4. Demos are not infrastructure. Exo's 405B demo is genuinely impressive. But the gap between "this worked on a LAN with three Macs we owned" and "this is a network you can rely on" is enormous, and it's the gap where most of the engineering actually lives: worker churn, model download resumption, trust scoring, rate limiting, coordinator durability, upgrade paths. Nobody's demos show this, because it's boring.

What I'd actually recommend

If you've read this far and you think the governance question actually matters, the best way to back SharedLLM today is to star the repo, try the cross-machine RPC tutorial, and file an issue when something breaks. That's how small projects become real ones.


See also: Announcing SharedLLM v0.1.0 · Splitting Llama across two MacBook Pros