Glia

Local-first AI memory bridge between browser chats and IDEs

143 followers

Local-first AI memory bridge between browser chats and IDEs

143 followers

Visit website

Productivity

•

Engineering & Development

•

LLM Memory

Glia is a 100% offline, open-source memory bridge. A Chrome extension auto-saves your web-based Claude/ChatGPT chats, while a native MCP server lets Cursor/Claude Code query those decisions locally from your shared SQLite database.

Free

Launch tags:Productivity•Developer Tools•Artificial Intelligence

Launch Team

Framer AI AgentsDesign and publish professional sites with AI

Promoted

Glia

Maker

📌

Hey Product Hunt! 👋 I'm Eshaan, the creator of Glia. Like most developers working with AI, my daily workflow became deeply fragmented. I’d solve a complex architectural challenge or debug a tricky error using Claude.ai, ChatGPT, or Gemini in my browser. But the second I switched back to my local editor (like Cursor, Windsurf, or Claude Code), my IDE agent had absolutely no idea that conversation had happened. I got tired of constantly copy-pasting code blocks, raw logs, and decision summaries back and forth. Existing memory solutions are almost all cloud-based SaaS, which felt like a massive privacy compromise for client codebases. So I built Glia—a 100% local-first, zero-Docker memory bridge that connects your web chats directly to your local IDE. The Architecture: The Shared Database: Everything is stored offline on your local disk in a single SQLite database (~/.glia/graph.db). No telemetry, no cloud tracking, and your data never leaves your device. The Web Sync: A lightweight Chrome extension securely intercepts prompts on 7 AI platforms (Claude, ChatGPT, Gemini, DeepSeek, Grok, Copilot, and Mistral) to save and index your technical decisions automatically. The IDE integration: A native Model Context Protocol (MCP) server hooks into your IDE (Cursor, VS Code, Windsurf, Claude Code) so your coding agents can query or save memory. Data Portability: Easily download any project session as a clean JSON file to sync manually to another machine or share with teammates offline. Preemptive FAQs / Objection Handling: Why not just use native codebase indexing in Cursor? Cursor indexes your files, and Claude Projects stores context for individual web sessions. But they are completely siloed. Claude Projects doesn't know what you coded in Cursor, and Cursor has no idea what you just solved in Claude.ai. Glia acts as the missing bridge that syncs context between them. Isn't a local LLM in the background a major resource drain? No heavy 8B models are running in the background. Glia uses a lightweight embedding model (nomic-embed-text ~270MB via Ollama) and sqlite-vec for local vector search. It's extremely light on CPU and RAM. How do you handle database write locks during embeds? We configured SQLite with Write-Ahead Logging (PRAGMA journal_mode=WAL;) and decoupled the embedding generation using an async background job queue. This ensures browser writes never block IDE reads. Quick Start: Setup takes less than a minute. You can spin up the local worker and register it with your IDE configs using: npx glia-ai-setup It's completely open-source (MIT). I'd love to hear your feedback, bugs, or feature suggestions. If Glia makes your development life easier, dropping a star on GitHub would mean the world to us! ⭐ What IDE or web platform would you like to see supported next?

Report

2mo ago

@eshaannair Congrats on the launch Eshaan. very cool, how do you extract the real convo gems instead of just having a context dump that might not be fully read?

Report

2mo ago

Glia

Maker

@zolani_matebese Thanks Zolani! To avoid dumping a massive wall of text, GLIA uses two main strategies:

Surgical Sentence Trimming: We don't just inject full chunks of text. Our retrieval engine operates at both the paragraph and the sentence level. When you query, it finds the relevant paragraphs but then surgically trims away the surrounding noise, only injecting the exact sentences that directly match your prompt. This typically reduces the context payload by up to 95%.
Knowledge Graph Extraction: In the background, a local LLM (like Llama 3) processes your saved conversations to extract structured facts (subject-relation-object triples). When you ask a question later, it fuses the precise vector matches with these structured graph facts, giving the AI dense, highly readable context instead of raw chat logs.

Report

2mo ago

Congrats on the launch! This is something I do feel daily: solve something in the Claude web app, switch to Claude Code in terminal, lose all that context. The MCP server + local SQLite combo is a great architectural bet for this.

Quick q for you: most of what happens in a Claude/ChatGPT chat is exploration, dead ends, half-formed ideas. How does Glia decide what to index as a meaningful technical decision vs noise? Is it post-hoc LLM extraction at save time, user-marked, or scored on whether the result actually got applied in code?

Report

2mo ago

Glia

Maker

@ferdi_sigona Great question and you've identified exactly the hardest unsolved problem in this space.

Right now Glia uses post-hoc LLM extraction at ingest time. When you save a chat, it runs an extraction pass that pulls out structured knowledge triples (subject → relation → object) and chunks the raw text for semantic search. The LLM is prompted to focus on decisions, facts, and technical conclusions — not exploratory back-and-forth but you're right that it has no signal on whether something was actually applied or just considered.

The honest answer is that "was this used in code?" is a signal I don't have yet. That's a genuinely hard problem it would require IDE-level telemetry to close the loop. User-marked memory is on the roadmap as a lighter-weight version of that signal.

The current bet is that the extraction quality + RAG retrieval is good enough that noise gets naturally de-ranked by relevance at retrieval time, even if it gets indexed. But I'd love to hear how you'd approach the signal problem this feels like exactly the right design question to get right in v2.

Report

2mo ago

@Glia The Hybrid RAG setup caught my eye immediately.. fusing sentence vectors, chunk vectors, and FTS5 keyword search together feels way more solid than what most memory systems are doing right now. I’ve been working on a Corrective RAG setup myself, and one of the biggest headaches was retrieval completely falling apart once the query got rephrased or drifted semantically from the original context. HyDE honestly feels like a really smart workaround for that.. generating a hypothetical answer first and then searching using that embedding instead of the raw query makes a lot of sense in practice.

What I’m curious about though is whether the synthetic embedding step adds noticeable latency during recall. In my experience even tiny delays start compounding very quickly once everything is happening inside an agent loop, especially when multiple retrieval passes are involved.

The shared SQLite bridge between the browser extension and MCP server is also honestly a really elegant design choice.. one database, two interfaces, no extra sync layer headaches. But I’d genuinely love to know how you’re handling write concurrency there. SQLite’s single-writer lock can get annoying fast, and if Cursor plus the browser extension both try writing context at the same time, does GLIA queue the writes internally or can one request fail silently? Feels like the kind of issue that would be super subtle to debug once an agent session is already running and actively mutating state.

Report

2mo ago

Glia

Maker

@akshaypal_bishnoi Really appreciate the detailed breakdown you clearly know this space well. Two great questions:

On HyDE latency: yes, it adds a step Glia generates a hypothetical answer via Ollama before embedding the query. In practice the latency hit is ~200-500ms on a mid-range local machine, which is acceptable for a single retrieval call but would absolutely compound in an agent loop with multiple passes. The tradeoff is worth it for the semantic drift problem you described querying with the raw rephrased input was noticeably worse in my tests. That said, I'm considering making HyDE opt-in for latency-sensitive setups.

On SQLite write concurrency: writes from both the extension and MCP server go through the same Node.js HTTP backend, so they're already serialized through the async job queue before touching SQLite they never write directly in parallel. The bigger edge case is a PROCESSING job mutating state while a new ingest comes in, which I handle by resetting ghost jobs on startup. It's not bulletproof but it's been stable in practice. WAL mode is enabled so reads never block. Would love to hear how you're handling this in your Corrective RAG setup.

Report

2mo ago

nice! local-first plus the browser-to-IDE bridge is the gap most ai workflows leak into. congrats on shipping.

when the browser chat and the actual codebase contradict (chat assumes library x, codebase migrated off it last week), which wins? file-system recency is the safe answer, but it throws away conversational nuance the chat captured. tie-breaker, or human prompted? best of luck with your launch!

Report

2mo ago

Glia

Maker

@hiyamojo Thanks Keith, that's a brilliant question.

Right now, GLIA acts as an AI Memory Layer rather than a direct codebase indexer (we leave the direct file-system indexing to tools like Cursor or GitHub Copilot). GLIA specifically captures the conversational memory the architectural debates, the 'why' behind a decision, and the messy terminal errors.

If the codebase migrates off library X, GLIA tracks that migration historically based on when it was discussed in the chat. When injecting context, GLIA provides timestamped RAG chunks to the LLM via MCP. So instead of a hard 'file-system vs chat' tie-breaker, the LLM is fed the chronological progression of your ideas and is usually able to deduce the latest state based on the recency of the conversational memory. It’s definitely a tricky balance, but treating memory chronologically has been the safest bet so far.

Report

2mo ago

@eshaannair and a brilliant answer! thank you

Report

2mo ago

Really interesting approach to local memory. The SQLite + MCP combo is clean. Curious how you handle context relevance, when Cursor queries past decisions, how does it decide which memories are actually useful vs noise from older conversations?

Report

2mo ago

Glia

Maker

@harshalvc_ai Good question! Relevance filtering happens at two layers:

Retrieval scoring - chunks are ranked by cosine similarity against the HyDE-augmented query embedding, with keyword boosting applied if the query entities match chunk content. Lower-scoring chunks get dropped naturally.
Character budget - the top-ranked chunks fill a fixed character budget (6000 chars per session), so noisy or older low-relevance context gets crowded out before it ever reaches the LLM.

There's no explicit time-decay penalty on older memories yet it's purely relevance-driven. That's something I want to add in v2, since a decision made 6 months ago probably deserves less weight than one made last week even if it's semantically similar.

Report

2mo ago

The browser-to-IDE bridge angle solves a real pain — design handoff context that lives in a browser (Figma comments, Linear tickets, Claude chat) never makes it cleanly into the IDE context window. Does Glia do any filtering on what browser content it passes to the IDE, or does it forward everything from the active tab regardless of content type? Wondering how it handles tabs with mixed content like docs pages that combine code snippets, images, and long prose.

Report

2mo ago

Glia

Maker

@sunnyallan Great question the filtering happens at retrieval, not at ingest. The extension scrapes the conversation DOM (not the full page), so it's already scoped to just the chat thread. Everything in that thread gets chunked, embedded, and stored. The magic is on the recall side: chunks are split into individual sentences at index time, and only the sentences that actually match your query come back not the surrounding noise. So a mixed docs page with code snippets, prose, and images gets stored in full, but a query about a specific API will surface just the relevant sentences. Benchmark compression is ~95%.

Images are the one honest gap alt text and surrounding context are captured, but not the visual content itself. On the roadmap.

Report

2mo ago

This solves a problem I’ve started noticing myself while switching between ChatGPT, Claude and coding tools. A lot of useful context gets lost across chats, tabs and IDEs, so having a local memory bridge feels genuinely useful instead of just “AI for AI”.

Also like that it’s offline/open-source. Feels much safer and more practical for developer workflows where context and previous decisions actually matter.

Report

2mo ago

Glia

Maker

@vikasnaik Hey Vikas, thanks so much for the feedback. "AI for AI" is exactly what I was trying to avoid with Glia. The goal wasn't to build another complex agentic framework, but a simple, practical utility that solves the exact problem you described: context loss between tabs and editors.

Glad the offline, local-first approach resonates with you as well.

Report

2mo ago

1 2

Reviews