Two 5,200-token runs. Same model. SHA-identical byte output. That's a proof, not a benchmark.
Shimmy v2.0 ships Airframe: pure-Rust GPU inference with hand-written WGSL compute shaders. No llama.cpp. No C. No Python. No CUDA. First production GGUF engine Rust all the way down — including the GPU shaders.
Run TinyLlama, Llama 3.2, Phi, DeepSeek from GGUF. Drop-in for AnythingLLM, Open WebUI, Cursor, Zed via OpenAI or Ollama API. Windows, macOS, Linux.
cargo install shimmy