I built this POC to test if complete RAG pipelines could run entirely client-side using WebGPU.
Key difference: zero server dependency. PDF parsing, embeddings, vector search, and LLM inference all happen in your browser.
Select a model (Llama, Phi-3, Mistral), upload a PDF, ask questions. Documents stay local in IndexedDB. Works offline once models are cached.
Integrated WeInfer optimization achieving ~3.76x speedup over standard WebLLM through buffer reuse and async pipeline processing.
I built mcl to simplify my daily terminal workflow.
It lets you create custom shortcuts for your most-used commands locally or globally using a simple JSON config.
It’s still early, but I plan to add auto-completion, plugin support, and command chaining soon.