Skip to content

Conversation

odelliab
Copy link

Putting document title first in body makes sure the title

  1. will appear at the beginning of the document when exported to markdown or text
  2. will be included in the headings list of document chunks

Putting document title first in body makes sure the title
1. will appear at the beginning of the document when exported to markdown or text
2. will be included in the headings list of document chunks 

Signed-off-by: odelliab <[email protected]>
Copy link
Contributor

DCO Check Passed

Thanks @odelliab, all your commits are properly signed off. 🎉

Copy link

mergify bot commented Aug 27, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@odelliab odelliab changed the title title item first in items_list fix: title item first in items_list Aug 27, 2025
@vagenas
Copy link
Collaborator

vagenas commented Aug 28, 2025

Hi @odelliab

Thank you for proposing this change.

I'd like to share some background that explains why we can't adopt it in the current API:

  • The children field represents the reading order of a document, as built by a pre‑order DFS traversal. (Chunk headings are derived from that same traversal.)
  • The reading order—including the ordering within children—is intentionally determined by the backends so that it reflects the structure of the original input file.

Because titles can appear anywhere in an input document and there may be multiple titles (e.g., DOCX, Markdown, etc.), forcing a title to always come first among its parent's children would interfere with the backends' ability to control the reading order.

In short, this change would not align with the core DoclingDocument API design.

Could you please close the PR when you have a moment?
If you have a concrete example document where you expected a different reading order, feel free to open an issue.

Thanks again for your effort and understanding.

@odelliab
Copy link
Author

Thank you @vagenas for this clarification.
I remain unsure what is considered to be a title rather than a section header.
Also, this means that when manipulating a document after its creation (in the case, for instance a title was not identified correctly), the TitleItem is automatically put at the end of the document, and is not reflected in the document export, or chunks headings.

@odelliab odelliab closed this Aug 28, 2025
@odelliab
Copy link
Author

It seems that add_title method is meant to be used only when creating a document, and in that case the end of items list so far represents reading order.
The usecase of manipulating document structure after creation is not to be handled within this code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants