If you ve ever tried to fine-tune an LLM locally, you know the "Cuda Out of Memory" heartbreak.
I wanted the convergence speed of 2nd-order optimizers (like Shampoo), but those methods usually destroy consumer GPUs because they require massive matrix inversions.
54% faster LLM training. SCAO is a sparse, second-order PyTorch optimizer designed as a high-throughput, drop-in replacement for AdamW. - whispering3/scao