Launching today

Glia
Local-first AI memory bridge between browser chats and IDEs
111 followers
Local-first AI memory bridge between browser chats and IDEs
111 followers
Glia is a 100% offline, open-source memory bridge. A Chrome extension auto-saves your web-based Claude/ChatGPT chats, while a native MCP server lets Cursor/Claude Code query those decisions locally from your shared SQLite database.



Glia
@eshaannair Congrats on the launch Eshaan. very cool, how do you extract the real convo gems instead of just having a context dump that might not be fully read?
Glia
@zolani_matebese Thanks Zolani! To avoid dumping a massive wall of text, GLIA uses two main strategies:
Surgical Sentence Trimming: We don't just inject full chunks of text. Our retrieval engine operates at both the paragraph and the sentence level. When you query, it finds the relevant paragraphs but then surgically trims away the surrounding noise, only injecting the exact sentences that directly match your prompt. This typically reduces the context payload by up to 95%.
Knowledge Graph Extraction: In the background, a local LLM (like Llama 3) processes your saved conversations to extract structured facts (subject-relation-object triples). When you ask a question later, it fuses the precise vector matches with these structured graph facts, giving the AI dense, highly readable context instead of raw chat logs.
Congrats on the launch! This is something I do feel daily: solve something in the Claude web app, switch to Claude Code in terminal, lose all that context. The MCP server + local SQLite combo is a great architectural bet for this.
Quick q for you: most of what happens in a Claude/ChatGPT chat is exploration, dead ends, half-formed ideas. How does Glia decide what to index as a meaningful technical decision vs noise? Is it post-hoc LLM extraction at save time, user-marked, or scored on whether the result actually got applied in code?
Glia
@ferdi_sigona Great question and you've identified exactly the hardest unsolved problem in this space.
Right now Glia uses post-hoc LLM extraction at ingest time. When you save a chat, it runs an extraction pass that pulls out structured knowledge triples (subject → relation → object) and chunks the raw text for semantic search. The LLM is prompted to focus on decisions, facts, and technical conclusions — not exploratory back-and-forth but you're right that it has no signal on whether something was actually applied or just considered.
The honest answer is that "was this used in code?" is a signal I don't have yet. That's a genuinely hard problem it would require IDE-level telemetry to close the loop. User-marked memory is on the roadmap as a lighter-weight version of that signal.
The current bet is that the extraction quality + RAG retrieval is good enough that noise gets naturally de-ranked by relevance at retrieval time, even if it gets indexed. But I'd love to hear how you'd approach the signal problem this feels like exactly the right design question to get right in v2.
@Glia The Hybrid RAG setup caught my eye immediately.. fusing sentence vectors, chunk vectors, and FTS5 keyword search together feels way more solid than what most memory systems are doing right now. I’ve been working on a Corrective RAG setup myself, and one of the biggest headaches was retrieval completely falling apart once the query got rephrased or drifted semantically from the original context. HyDE honestly feels like a really smart workaround for that.. generating a hypothetical answer first and then searching using that embedding instead of the raw query makes a lot of sense in practice.
What I’m curious about though is whether the synthetic embedding step adds noticeable latency during recall. In my experience even tiny delays start compounding very quickly once everything is happening inside an agent loop, especially when multiple retrieval passes are involved.
The shared SQLite bridge between the browser extension and MCP server is also honestly a really elegant design choice.. one database, two interfaces, no extra sync layer headaches. But I’d genuinely love to know how you’re handling write concurrency there. SQLite’s single-writer lock can get annoying fast, and if Cursor plus the browser extension both try writing context at the same time, does GLIA queue the writes internally or can one request fail silently? Feels like the kind of issue that would be super subtle to debug once an agent session is already running and actively mutating state.
Glia
@akshaypal_bishnoi Really appreciate the detailed breakdown you clearly know this space well. Two great questions:
On HyDE latency: yes, it adds a step Glia generates a hypothetical answer via Ollama before embedding the query. In practice the latency hit is ~200-500ms on a mid-range local machine, which is acceptable for a single retrieval call but would absolutely compound in an agent loop with multiple passes. The tradeoff is worth it for the semantic drift problem you described querying with the raw rephrased input was noticeably worse in my tests. That said, I'm considering making HyDE opt-in for latency-sensitive setups.
On SQLite write concurrency: writes from both the extension and MCP server go through the same Node.js HTTP backend, so they're already serialized through the async job queue before touching SQLite they never write directly in parallel. The bigger edge case is a PROCESSING job mutating state while a new ingest comes in, which I handle by resetting ghost jobs on startup. It's not bulletproof but it's been stable in practice. WAL mode is enabled so reads never block. Would love to hear how you're handling this in your Corrective RAG setup.
nice! local-first plus the browser-to-IDE bridge is the gap most ai workflows leak into. congrats on shipping.
when the browser chat and the actual codebase contradict (chat assumes library x, codebase migrated off it last week), which wins? file-system recency is the safe answer, but it throws away conversational nuance the chat captured. tie-breaker, or human prompted? best of luck with your launch!
Glia
@hiyamojo Thanks Keith, that's a brilliant question.
Right now, GLIA acts as an AI Memory Layer rather than a direct codebase indexer (we leave the direct file-system indexing to tools like Cursor or GitHub Copilot). GLIA specifically captures the conversational memory the architectural debates, the 'why' behind a decision, and the messy terminal errors.
If the codebase migrates off library X, GLIA tracks that migration historically based on when it was discussed in the chat. When injecting context, GLIA provides timestamped RAG chunks to the LLM via MCP. So instead of a hard 'file-system vs chat' tie-breaker, the LLM is fed the chronological progression of your ideas and is usually able to deduce the latest state based on the recency of the conversational memory. It’s definitely a tricky balance, but treating memory chronologically has been the safest bet so far.
@eshaannair and a brilliant answer! thank you
Really interesting approach to local memory. The SQLite + MCP combo is clean. Curious how you handle context relevance, when Cursor queries past decisions, how does it decide which memories are actually useful vs noise from older conversations?
Glia
@harshalvc_ai Good question! Relevance filtering happens at two layers:
Retrieval scoring - chunks are ranked by cosine similarity against the HyDE-augmented query embedding, with keyword boosting applied if the query entities match chunk content. Lower-scoring chunks get dropped naturally.
Character budget - the top-ranked chunks fill a fixed character budget (6000 chars per session), so noisy or older low-relevance context gets crowded out before it ever reaches the LLM.
There's no explicit time-decay penalty on older memories yet it's purely relevance-driven. That's something I want to add in v2, since a decision made 6 months ago probably deserves less weight than one made last week even if it's semantically similar.
Great work! The 7-platform Chrome extension surface is probably the trickiest bet here? Claude, ChatGPT, Gemini all rework their DOM constantly and each one breaks differently.
Glia
@artstavenka1 On point! maintaining and constanly extracting dom is the trickiest part, so i made a Selector staleness checker which runs every week and creates a issue if DOM changes in any of the supported platform.
This is awesome! I built something similar, but took the opposite direction — no local setup, pure clipboard injection within certain LLM's.
Glia
@riveradev Thanks Nico. Clipboard injection is honestly such an elegant and lightweight way to solve this without forcing users to run background services or local databases.
We ended up going the heavy local-first route (SQLite vector db + Knowledge Graph) because we wanted cross-platform memory tracking (so a conversation in ChatGPT could automatically provide context to Claude tomorrow) and persistent semantic search. But both approaches definitely tackle different sides of the same friction point! Would love to check out your tool if you have a link.
@eshaannair Of course! Love the tech around your tool.
Appreciate that Eshaan! You can check it out at pryme-site.vercel.app — would love your thoughts on the approach.
Glia
@riveradev Solving the 'blank slate' problem in AI chats is a huge need, and the UI looks slick.
A few quick product suggestions you might consider:
Auto-Injection: Instead of relying on clipboard copy/paste, having the extension automatically drop the text into the chat's input box would make the UX feel like magic.
Preventing Stale Context: Since projects evolve, static cards can get outdated quickly. A 'right-click to append' shortcut from within the chat could help users easily keep their cards fresh.
Clarify Privacy: The site says 'data never leaves your device,' but the Pro tier has device sync. Clarifying if that sync is End-to-End Encrypted will build a lot of trust for devs putting sensitive code in their context.
Even without these features the product still works beautifully.
Love the clean approach you're taking with this. Best of luck with the growth!
@eshaannair This is really helpful, Eshaan, thank you! On auto-injection — I actually tried DOM injection first, but switched to clipboard because Claude and ChatGPT update their UI constantly, and it kept breaking. Clipboard is ugly, but bulletproof. The right-click append idea is going straight into my v2; it really makes sense with that! And you're 100% right on the privacy copy — that's a contradiction I need to fix immediately. Really appreciate you taking the time.
Wish you the best in your future endeavors!