Krish Singaria

Graphzero - Zero-copy C++ graph engine to train PyTorch GNNs with 0 RAM.

by•
GraphZero fixes PyTorch Geometric OOM crashes by memory-mapping massive graph datasets directly from your SSD. Built with C++20 and nanobind, it hands raw pointers to PyTorch as zero-copy NumPy arrays. Train 50GB models on consumer hardware by letting the OS handle page faults while the GPU focuses entirely on the math.

Add a comment

Replies

Best
Krish Singaria
Maker
📌
Hey Product Hunt! 👋 I’m currently a 3rd-year CS student, and I built GraphZero to solve a massive headache I ran into while working with Graph Neural Networks: the dreaded PyTorch Out-Of-Memory (OOM) crash. The Problem: Standard libraries like PyTorch Geometric try to load massive datasets (like Papers100M) entirely into RAM before moving them to the GPU. On a standard machine, this causes an instant 24GB+ allocation crash. The Solution: I built a custom C++ data engine that bypasses system RAM completely. It compiles data into optimized .gl and .gd binary formats. It uses POSIX mmap to memory-map the files directly from the SSD. Using nanobind, it hands raw C++ pointers directly to PyTorch as zero-copy NumPy arrays. During training, PyTorch thinks it has a 50GB tensor sitting in RAM, but it's actually streaming natively from the SSD via OS Page Faults. We use OpenMP in C++ to multi-thread neighbor sampling, releasing the Python GIL to fully saturate disk I/O. I built this to dive deep into low-level memory management, C-bindings, and CI/CD pipelines. The repo includes a self-contained synthetic dataset and a plug-and-play GraphSAGE training script so you can test the zero-copy mounting locally. I would absolutely love any harsh technical feedback on the C++ architecture, the template dispatching, or the API design!