smoosh - Make anyone on your team an expert in your tech. Open Source

Turn git repos into RAG-AI optimised context. Pure bash, zero dependencies. We've been pairing smoosh with NotebookLM: No hallucinations. Technical or non-technical team members can get all the info they need without holding up productivity. Even our non-technical work is in repos now to have the same flow in reverse: everyone's got on-demand access to any info they need with total accuracy and source citation. Private by default. Agent native. Interactive CLI. No dependencies. Get smooshin

Why smoosh? Getting codebase context into AI takes time. Other tools do this but require bloated node_modules or Python environments just to concatenate text. We built smoosh internally for a zero-dependency, native approach. It turns a 20-minute chore into one fast, reliable command. - Understand codebases: Upload to tools like NotebookLM to talk through your architecture without reading source. - Give AI context: Drop into Claude/ChatGPT for an assistant that knows your code, eliminating hallucinated APIs. - Onboard instantly: Give new hires a searchable snapshot to learn from, regardless of their technical background. - Ground your agents: Output is RAG-optimised, chunked within limits, and retains metadata. - Private by default: Runs locally. No API keys, zero telemetry. How it Works smoosh isn't just a wrapper around cat. It is a strict, structured pipeline: 1. Discovery: Uses git ls-files to perfectly respect your .gitignore. 2. Filtering: Applies extension rules (--docs, --code) or MIME-type checks (--all) to drop binaries and noise. 3. Chunking: Streams content through a fast word-count heuristic, splitting files sequentially without breaking mid-file. 4. Verification: The final output is strictly cross-referenced against the expected file list. Any mismatch yields an immediate exit 4. Features - File presets — --docs (md, txt, etc.), --code (docs + code), --all (excludes binaries via MIME checks). - Smart chunking — Stays within token limits (project_part1.md). - 100% verification — Exits 4 if output mismatches the git index. - Interactive mode — Guided setup on first run. - Remote repos — smoosh https://github.com/user/repo (clones & processes instantly). - Secrets detection — Warns on AWS keys, PATs, and PEM blocks. - Output formats — Markdown, text, or CDATA XML. - Table of contents — --toc generates a per-chunk file index. - Line numbers — --line-numbers for code reviews. - Agent-native — Designed for CI/Agents (--json, --no-interactive, deterministic exits).

smoosh - Make anyone on your team an expert in your tech. Open Source

Replies