Hey everyone, I’m Emanuele, and I built WebPizza AI to see if browser-based RAG could actually work — and to rethink how much we trust cloud AI tools with our data. Most “chat with your docs” apps work like this: upload your PDF → processed on their servers → chat with results. That’s fine, but it means your files live somewhere you can’t control. I wanted to flip that idea. What if everything stayed local? So I built WebPizza AI — a proof of concept that runs fully in your browser, keeps your PDFs on your device, uses WebGPU + WebLLM for local inference, includes a full RAG pipeline (PDF → embeddings → vector search → LLM), and is open source and auditable. Current state: it works! (mostly 😅) It supports multiple models (Phi-3, Mistral, Llama, Qwen), includes WeInfer optimization (~3.7× faster than plain WebLLM), and can run offline after setup. Why I built it: I love the convenience of ChatGPT, Claude, and similar tools — but sometimes you just need to keep your stuff private. Think legal docs, medical data, research notes, or internal company material. I wanted to see if local-first AI could be practical, not just theoretical. What I’d love feedback on: When would local-first AI actually matter to you? Would you accept slower speeds or more setup in exchange for privacy? Who do you think needs this most — lawyers, researchers, journalists? Try it here → github.com/stramanu/webpizza-ai-poc It’s still early, but I’d love your thoughts. Let’s see if local AI can actually compete.

WebPizza AI - Private PDF Chat

POC: Private PDF AI using only your browser with WebGPU

POC: Private PDF AI using only your browser with WebGPU