Guidance on Optimal Chunking Configuration for LLM-Based Processing of Financial PDFs #1371

igelfenbeyn · 2025-08-19T15:18:08Z

igelfenbeyn
Aug 19, 2025

Hello,

I’m working on processing a large number of loosely related PDF files—primarily financial statements such as balance sheets, income statements, and similar documents. In this project, I’m not defining a fixed ontology upfront; instead, I’m relying on the LLM to determine how to interpret and extract information from each document.

Given this use case, I’d like to know: What are the most optimal chunking configurations for this kind of unstructured, heterogeneous input?

Additionally, is there any documentation or best-practice guide that explains the trade-offs between using larger vs. smaller chunk sizes? I’m particularly interested in how chunk size impacts context retention, accuracy of entity/relation extraction, and overall performance when using LLMs for knowledge graph construction.

Any advice or references would be greatly appreciated!

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guidance on Optimal Chunking Configuration for LLM-Based Processing of Financial PDFs #1371

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Guidance on Optimal Chunking Configuration for LLM-Based Processing of Financial PDFs #1371

Uh oh!

igelfenbeyn Aug 19, 2025

Replies: 0 comments

igelfenbeyn
Aug 19, 2025