Guidance on Optimal Chunking Configuration for LLM-Based Processing of Financial PDFs #1371
Unanswered
igelfenbeyn
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I’m working on processing a large number of loosely related PDF files—primarily financial statements such as balance sheets, income statements, and similar documents. In this project, I’m not defining a fixed ontology upfront; instead, I’m relying on the LLM to determine how to interpret and extract information from each document.
Given this use case, I’d like to know: What are the most optimal chunking configurations for this kind of unstructured, heterogeneous input?
Additionally, is there any documentation or best-practice guide that explains the trade-offs between using larger vs. smaller chunk sizes? I’m particularly interested in how chunk size impacts context retention, accuracy of entity/relation extraction, and overall performance when using LLMs for knowledge graph construction.
Any advice or references would be greatly appreciated!
Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions