UNC

HuggingFace transformer compiler for optimised inferences

1 follower

HuggingFace transformer compiler for optimised inferences

1 follower

Compiles HuggingFace transformer models into optimised native Metal inference binaries. No runtime framework, no Python — just a compiled binary that runs your model at near-hardware-limit speed on Apple Silicon, using 25% less GPU power and 1.7x better energy efficiency than mlx-lm UNC is 1.35x faster while using 25% less GPU power, resulting in 1.7x better energy efficiency. 8.4x fewer CPU instructions means less heat, less power, and more headroom for the GPU than MLX for Apple.

Overview
Reviews
Team
More

UNC Reviews

Reviews

No reviews yetBe the first to leave a review for UNC