Add a way to strip padding tokens during detokenization

Often the token sequences being dealt with across out library are dense and padded with a padding token. When these are passed to our `detokenize` method of tokenizers they look like `"the quick brown fox <pad> <pad> <pad>"`.

Users can work around this today by stripping any padding tokens before calling `detokenize()`, but the `tf.ragged` incantations to do so are complex, and it would be useful if we could strip that padding during detokenization out of the box.

This could be additional functionality we add to the token base class (or a utility we add onto every `detokenize()` implementation).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a way to strip padding tokens during detokenization #645

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a way to strip padding tokens during detokenization #645

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions