Everything you need to know about production ready RAG systems : Part 1

By Pradyumna Chippigiri

November 18, 2025

Series

What is RAG?

RAG stands for Retrieval Augmented Generation. It was introduced in the paper Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.


Each step can be roughly broken down to:


You can think of RAG like an open-book exam:

So the pipeline is:

Index search → retrieve relevant pages → read them → think → answer.

Exactly how RAG works:

Placeholder image


You might now ask, our mind is not capable to read the entire book to answer quickly to the question, but LLMs can right, so what is the use of RAG?


I asked ChatGPT and Google Gemini to answer questions from a 1000 page PDF and here’s how it answered.


Placeholder image


Placeholder image


As you can see that ChatGPT was not able to answer as its context window is lesser compared to Gemini. Hence, Google Gemini was able to read and output the question asked.


But here comes the catch: even though LLMs today have massive context windows and can technically “read” huge amounts of text, there are two major limitations that make RAG still essential.


So RAG is valuable and will be in the future too.

The Four Main Building Blocks of a RAG System

Placeholder image


In this article we will talk about document preprocessing that is basically data ingestion, and data chunking and different strategies in chunking. In the next article we will talk about embedding generation and retrieval.

Data Ingestion

Everyone thinks data ingestion is easy, “just upload a PDF and extract text” but there are so many complexities involved in this as well. If given a PDF, where would you store it, how would you parse it? How would you deal with images and tables?


There are some powerful libraries that you should know:


PyMuPDF (fitz)


Best for mixed PDFs (text + images). Extracts:

Docs: https://pymupdf.readthedocs.io/en/latest/


But if a page is just a scanned image, PyMuPDF cannot read text → you need OCR.


OCR using Tesseract


For scanned PDFs or images:

Repo: https://github.com/tesseract-ocr/tesseract


Docling


Extracts:

Docs: https://docling-project.github.io/docling/


These libraries help you in opening and reading the PDFs, but in some cases when you are dealing with industry level projects, often you do not have the PDFs, you will have to scrape the data from the web. So here are some libraries you would want to know to scrape data from the web.


Firecrawl


Fully crawls:

And converts them into LLM-ready Markdown.


Docs: https://www.firecrawl.dev/


Puppeteer


Think of it as an automated browser that you can script the same way you would manually click, search, scroll, and interact with web pages. Also lets you scrape only what you want:

Docs: https://pptr.dev/

Data Chunking

Once your documents are cleaned and extracted, the next question is: “How should I split the text so the LLM can retrieve the right information?”


Chunking matters because:

There are five chunking strategies, each with specific use cases and each of them have their own pros and cons, let’s discuss all of that in this discussion:

1. Fixed-Size Chunking

Fixed-size chunking divides text into chunks based purely on length, for example, every 200 words or every 500 tokens. It simply chops the text based on size rules, which makes it extremely fast but contextually weak.

Pros:

Cons:

Best For:

2. Semantic Chunking

Placeholder image


Semantic chunking uses embedding similarity to group sentences that are meaningfully related. Instead of breaking text by size, it breaks based on content. Each sentence is compared to the chunk being built; if it stays semantically similar, it joins the chunk. If it drifts too far than the cosine similarity threshold that we mention then, a new chunk is created.

Pros:

Cons:

Best For:

3. Structural Chunking

Structural chunking uses the natural format of the document headings, subheadings, sections, tables of contents to define chunk boundaries. It respects the author’s intended organization, making chunks more meaningful and human-readable. Example: by introduction, overview, results, etc.


Pros:

Cons:

Best For:

4. Sliding-Window Chunking

Sliding-window chunking creates overlapping chunks to preserve continuity. Instead of splitting text once, you slide a window across the document. For example, a 500-token window moving forward 250 tokens at a time. This ensures that important context that appears near chunk boundaries is not lost.


Example:

Pros:

Cons:

Best For:

5. Recursive Chunking

Placeholder image


Recursive chunking is a hybrid strategy. It first splits the document using high-level structure (e.g., headings). Then, if any section exceeds a maximum size that we mention, it splits that section again using another method (often fixed-size or semantic). This maintains both structure and consistency.


Pros:

Cons:

Best For:

Which chunking strategy to use and when?

You could also refer to this beautiful blog by Weaviate for different chunking strategies and when to chose one over the other, they have some nice visuals Weaviate Article


The most important question to ask is: “Does my data need chunking at all?”


Chunking is designed to break down long, unstructured documents. If your data source already has small, complete pieces of information like FAQs, product descriptions, or social media posts, you usually do not need to chunk them. Chunking can even cause problems. The goal is to create meaningful semantic units, and if your data is already in that format, you’re ready for the embedding stage.


Once you’ve confirmed that your documents are long enough to benefit from chunking, you can use the following questions to guide your choice of strategy:


How to Optimize the Chunk Size for RAG in Production

Optimizing chunk size in a production setting takes many tests and reviews. Here are some steps you can take:



Already this article has gone long sorry for that, in the next article we will learn about embeddings and the generation part of it.


Hope you liked this article, and if you liked it please do share it on social media with your friends and followers and Subscribe to my weekly newsletter!


See you in the next one.