Skip to content

Add a way to strip padding tokens during detokenization #645

@mattdangerw

Description

@mattdangerw

Often the token sequences being dealt with across out library are dense and padded with a padding token. When these are passed to our detokenize method of tokenizers they look like "the quick brown fox <pad> <pad> <pad>".

Users can work around this today by stripping any padding tokens before calling detokenize(), but the tf.ragged incantations to do so are complex, and it would be useful if we could strip that padding during detokenization out of the box.

This could be additional functionality we add to the token base class (or a utility we add onto every detokenize() implementation).

Metadata

Metadata

Assignees

Labels

scoping requiredFeatures that need significant design and planning before being actionableteam-createdIssues created by Keras Hub team as part of development roadmap.type:featureNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions