Document markdown and chunking for all RAG

Hi All,

I built a RAG tool to assist (primarily for legal, government and technical documents) working with:

- RAG pipelines

- AI applications requiring contextual transcription, description, access, search, and discovery

- Vector Databases

- AI applications requiring similar content retrieval

The tool currently offers the following functionalities:

- Markdown documents comprehensively (adds relevant metadata : short title, markdown, pageNumber, summary, keywords, base image ref etc.)

-Chunk documents into smaller fragments using:

- a pretrained Reinforcement Learning based model or

- a pretrained Reinforcement Learning based model with proposition indexing or

- standard word chunking

- recursive character based chunking

character based chunking

- upsert fragments into a vector database

- install it using:

pip install prevectorchunks-core

- interested to contibute? : pm pls

Looking for feedback and discussions.

10 views