All activity
Denisleft a comment
Hey Product Hunt! I built SecuriLayer after seeing too many community members get scammed in Telegram groups and Discord servers. Fake airdrops, phishing links, impersonators ā moderators can't keep up. The key difference: We don't just block ā we explain WHY something is dangerous using AI. Plus, our Vigil analyst knows your community's specific risk profile. Built for crypto projects, trading...

SecuriLayer MultiPlatform ONNX & AssAI-powered scam detection for Telegram, Discord & Slack
SecuriLayer protects online communities from scams, phishing, and social engineering attacks in real-time. Unlike traditional moderation tools that react after damage is done, SecuriLayer uses triple-layer AI: ONNX ML engine for instant verdicts, Llama 3.2 LLM for plain-English explanations, and Vigil AI analyst that knows your community's risk profile. Features include threat intel feeds (URLhaus, OpenPhish, ScamSniffer), SIEM export for enterprise teams, scam DNA fingerprinting,and extension.

SecuriLayer MultiPlatform ONNX & AssAI-powered scam detection for Telegram, Discord & Slack
Denisstarted a discussion
SecuriLayer Enterprise ā AI-Powered Community Security Platform
SecuriLayer Enterprise is the comprehensive AI-powered security platform designed for organizations that need compliance-grade audit trails, advanced threat intelligence, and enterprise-scale protection across Telegram, Discord, Slack, and X/Twitter communities. Enterprise Features Advanced Threat Detection ONNX + LightGBM ML Engine ā Real-time analysis with 99.9% accuracy, sub-100ms response...
Denisleft a comment
TurboQuant-MoE v0.3.0 released! ⢠Up to 15.4à KV-cache compression ⢠Cross-layer delta + 3-bit PolarQuant Serious VRAM killer for MoE models (Mixtral, DeepSeek, Qwen etc.)
TurboQuant-MoE:8.5x KV-Cache Compression8.5x KV-cache compression for LLM inference
Denisleft a comment
Hey, this is Denis. PROBLEM: LLM inference costs $10k/month because KV cache eats up memory. SOLUTION: I squeezed it 8.5 times. The quality is the same. PROOF: 256MB ā 30MB 8.48x faster $10k ā $1.2k per month HOW: Orthogonal conversion from Google DeepMind is used. RESULT: Works with Mixtral, DeepSeek, Qwen. MIT license. Free. github.com/RemizovDenis/turboquant Questions? I'm here.
TurboQuant-MoE:8.5x KV-Cache Compression8.5x KV-cache compression for LLM inference
Denisleft a comment
⨠SUPPORTED MODELS ā
Mixtral 8x7B, 8x22B (Production Ready) š DeepSeek, Qwen 1.5-MoE (Experimental) šÆ WHAT'S INCLUDED ⢠KV quantization engine ⢠Dynamic expert cache ⢠Speculative prefetch ⢠Transformers/vLLM integration ⢠Full benchmark suite ⢠MIT open-source license Built in 3 hours using AI to architect, then properly engineered.
TurboQuant-MoE:8.5x KV-Cache Compression8.5x KV-cache compression for LLM inference
Production KV-cache compression for Mixture-of-Experts language models.
LLM inference costs explode because:
⢠KV-cache grows with sequence length (16k tokens = 256MB per token)
⢠MoE models waste GPU storing inactive experts
⢠Memory becomes the bottleneck, not compute
š REAL BENCHMARKS (Mixtral 8x7B)
⢠KV Memory: 256MB ā 30MB (8.53x smaller)
⢠Quality: 100% preserved (zero degradation)
⢠Speed: 8.48x faster in production
⢠Expert Cache Hit: 96.75%
⢠GPU Memory Saved: 6.42 GB per layer
TurboQuant-MoE:8.5x KV-Cache Compression8.5x KV-cache compression for LLM inference
