Launching today
 Shimmy v2.0

Shimmy v2.0

The first pure-Rust GGUF inference engine. No C. No Python.

2 followers

Two 5,200-token runs. Same model. SHA-identical byte output. That's a proof, not a benchmark. Shimmy v2.0 ships Airframe: pure-Rust GPU inference with hand-written WGSL compute shaders. No llama.cpp. No C. No Python. No CUDA. First production GGUF engine Rust all the way down β€” including the GPU shaders. Run TinyLlama, Llama 3.2, Phi, DeepSeek from GGUF. Drop-in for AnythingLLM, Open WebUI, Cursor, Zed via OpenAI or Ollama API. Windows, macOS, Linux. cargo install shimmy

Shimmy v2.0 makers

Here are the founders, developers, designers and product people who worked on Shimmy v2.0