Added len method that returns the number of tokens in the vocabulary #7

Abdullahkhan5 · 2025-04-29T07:11:54Z

Added this method, because it gives this NotImplementedError instead

/usr/local/lib/python3.11/dist-packages/docling_core/transforms/chunker/hybrid_chunker.py in _patch(cls, data)
71 @classmethod
72 def _patch(cls, data: Any) -> Any:
---> 73 if isinstance(data, dict) and (tokenizer := data.get("tokenizer")):
74 max_tokens = data.get("max_tokens")
75 if isinstance(tokenizer, BaseTokenizer):
/usr/local/lib/python3.11/dist-packages/transformers/tokenization_utils_base.py in len(self)
1511
1512 def len(self) -> int:
-> 1513 raise NotImplementedError()
1514
1515 def get_vocab(self) -> Dict[str, int]:
NotImplementedError:

Added len method that returns the number of tokens in the vocabulary

2f30609

Chri5At approved these changes Sep 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added len method that returns the number of tokens in the vocabulary #7

Added len method that returns the number of tokens in the vocabulary #7

Uh oh!

Abdullahkhan5 commented Apr 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Added len method that returns the number of tokens in the vocabulary #7

Are you sure you want to change the base?

Added len method that returns the number of tokens in the vocabulary #7

Uh oh!

Conversation

Abdullahkhan5 commented Apr 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants