
LocalForge
The last line of defence before your code hits git history
16 followers
The last line of defence before your code hits git history
16 followers
You're working fast or vibe-coding, the LLM is shipping faster than you can review, and three days later you're rotating AWS keys at 2am. LocalForge intercepts every git commit before it finalises. 3 layers: Rust regex blocks secrets in <1ms, CoreML on the Neural Engine catches unsafe patterns statistically, and a local Qwen LLM reviews your diff like a human and it's all fully offline on Apple Silicon. Nothing leaves your Mac.






Hey PH ๐
I built this because I kept seeing the same thing in AI-assisted codebases. It's not malice but ultimately it's just speed. When you're vibe coding or working fast because of a deadline, and the LLM is generating 200 lines at a time, secrets and unsafe patterns slip through the review loop since it can't keep up with the generation loop.
The thing I wanted most was something that ran before git, not after. By the time a secret is in your history, the damage is done even if you rotate immediately since the commit hash is permanent and the exposure window already existed.
Three things I'd love feedback on:
1. The Layer 2 training set is 297 samples across 11 languages so it's still small. If anyone has labeled risky/clean code snippets they'd share, I'd love to grow it.
2.The VS Code extension is next and it'll reuse the same pipeline so you get inline squiggles without running a commit.
3. Apple Silicon only right now. A Linux port would need different runtimes for both CoreML and MLX so I'm, interested in whether there's appetite for it.
Repo is MIT and fully open. Would love issues, PRs, or just to hear what secret patterns you've seen slip through that I'm not covering yet.
Pre-commit is the right place for this. Catching secrets after they hit git history always feels too late. Iโd be curious to see how often the local LLM flags useful issues vs noisy false positives in real projects.
@anton_tomilov1ย
Hi Anton, thanks for the support!
So here were the results of the preliminary evaluation test. We created 145 diffs for corpus size and initially used Qwen 2.5 Coder 1.5B. At first, we had massive false positive flags on languages such as Java, Java Script, and Swift with a Precision of 0.649 and a False Positive Rate (FPR) of 0.727. Which, I admit, is not even usable.
For the update, we changed the model to Qwen 2.5 Coder 7B and added a post-inference false positive filter plus a pre-inference clean-diff fast path that skips the model entirely when no risky keywords appear in the added lines. This pushed our evaluations results to a Precision of 0.982 and a massive drop in FPR with it now just being 0.018. Yet it did come at a cost, we went from 16 missed detections to 36. So that's another thing I have to work on.
Graphs are as follows:
Before update:
After update:
Comparison table:
Metric
Preliminary (1.5B, 04:13)
Current (7B, 05:15)
Delta
Precision
0.649
0.982
+0.333
Recall
0.822
0.600
-0.222
F1
0.726
0.745
+0.019
FPR
0.727
0.018
-0.709
TP
74
54
-20
FP
40
1
-39
FN
16
36
+20
TN
15
54
+39
Mailwarm
Congrats on today's launch.
@thamibenjellounย Thank you!