Skip to content

Convert between text and tokens easily #6

@deontologician

Description

@deontologician

Currently a couple of the apis talk in tokens, which is inconvenient. It would be nice if you could translate text into tokens and vise-versa easily.

The rust_tokenizer crate has a function called from_file that allows instantiating the GPT2 tokenizer given a couple pretrained tokenizer files. These files are available from huggingface's website here:

There is also an example in rust_bert of constructing a gpt2 tokenizer. Ideally the tokenizer would be built lazily so users of the library don't need to pay for it unless they need the features.

Where to use it

It looks most like this will be useful with the logit_bias feature, since the api requires you send the token number, rather than actual strings. Since the example code is in python, this is a bit of a barrier to users in rust.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions