rowen

GitHub - BERT, Tokenizer, Python, WordPiece, pybind11,C++,Flash,Trie

byโ€ข
๐Ÿš€ **FlashTokenizer: World's Fastest CPU Tokenizer!** โšก 8~15x faster than `BertTokenizerFast` ๐Ÿ› ๏ธ High-performance C++ ๐Ÿ”„ Parallel with OpenMP ๐Ÿ“ฆ Easy pip install ๐Ÿ’ป Cross-platform (Win/Mac/Linux) โ–ถ๏ธ Demo: https://youtu.be/a_sTiAXeSE0

Add a comment

Replies

Best
rowen
Maker
๐Ÿ“Œ
๐Ÿ‘‹ Hi Product Hunters! We're excited to launch **FlashTokenizer**, the world's fastest CPU tokenizer optimized specifically for large language models like BERT. We built this to significantly speed up NLP inferenceโ€”achieving **8-15x faster performance** compared to traditional tokenizers. - Key features include: - โšก Ultra-fast tokenization - ๐Ÿ› ๏ธ Optimized C++ performance - ๐Ÿ“ฆ Simple pip installation - ๐Ÿ’ป Cross-platform compatibility (Windows, macOS, Ubuntu) We'd love your feedback, thoughts, and questionsโ€”let's discuss! ๐Ÿš€