LocalMind - On-device AI assistant for Android. No cloud. No signup.

LocalMind runs an LLM (Qwen 2.5 / Gemma 2 / Phi-4 family) entirely on your phone via llama.cpp. No API calls, no telemetry, no signup. The 1.5B Q8 model fits in ~2GB RAM on a 2022-class phone. Conversations are kept in encrypted local storage; one tap to wipe. Works on airplane mode. Built for people who want AI help without their conversations sitting in someone else's cloud.

Hi 👋 Maker here. LocalMind is the on-device LLM for Android I built because every AI app I tried wanted to send my prompts to someone else's cloud. On LocalMind the model runs entirely on the phone via llama.cpp — no API calls, no telemetry. Stack: GGUF-quantized models (Qwen 2.5 / Gemma 2 / Phi-4), CPU inference. The 1.5B Q8 fits in ~2GB RAM during inference on a 2022-class phone. Conversations stay in encrypted local storage; one tap to wipe. Things I'd love feedback on: • Default model: 1.5B sweet spot, or should the smaller 0.5B be default? • Any preset prompt template that would be killer built-in? • Speed vs. quality tradeoff — is on-device latency acceptable for you?

LocalMind - On-device AI assistant for Android. No cloud. No signup.

Replies