Great blog, Hank!
I want to add the “mirror image” of context length: chunk size and overlap when scraping or preparing content for AI.
If context length is the AI’s working memory, chunk size is like the size of each block of data we hand it. When we scrape documents, logs, or configs for later retrieval (RAG), we have to slice them into pieces that:
A) Fit comfortably in the model’s context window when retrieved later.
B) Preserve enough surrounding information to maintain meaning.
That’s where overlap comes in. If chunks are too large, they won’t fit alongside the prompt or other chunks during retrieval. Too small, and you risk losing important context — like breaking a sentence mid-thought or splitting related log lines. Overlap solves this by including a bit of the previous chunk in the next one, so the AI always has the full picture.
In Short:
– Context length → How much text/tokens the AI can process at once during inference (prompt + retrieved chunks). It’s like the size of the “workspace” the model has open while it’s thinking. Bigger context = more information considered at once, but also higher memory/compute cost.
– Chunk size → How large each block or segment of source material is when we store it in the retrieval index. Too big, and a single retrieved chunk might overflow the context window or waste space. Too small, and the AI may not have enough context to interpret it correctly—unless compensated with sufficient overlap.
– Overlap → A deliberate amount of repeated content between chunks (e.g., 50–100 tokens) so that information at chunk boundaries isn’t lost. Think of it like overlapping tiles or sliding windows — ensuring continuity when the AI stitches ideas together during retrieval.
– When scraping logs for a troubleshooting RAG workflow, I’ve found that ~500–1000 token chunks with 50–100 token overlap often balance retrieval accuracy with efficiency—but like any “nerd knob,” tuning via trial and error is everything.