Document markdown and chunking for all RAG
Hi All,
I built a RAG tool to assist (primarily for legal, government and technical documents) working with:
- RAG pipelines
- AI applications requiring contextual transcription, description, access, search, and discovery
- Vector Databases
- AI applications requiring similar content retrieval
The tool currently offers the following functionalities:
- Markdown documents comprehensively (adds relevant metadata : short title, markdown, pageNumber, summary, keywords, base image ref etc.)
-Chunk documents into smaller fragments using:
- a pretrained Reinforcement Learning based model or
- a pretrained Reinforcement Learning based model with proposition indexing or
- standard word chunking
- recursive character based chunking
character based chunking
- upsert fragments into a vector database
- install it using:
pip install prevectorchunks-core
- interested to contibute? : pm pls
Looking for feedback and discussions.


Replies