Support working with arbitrary bytes on the llama-server /tokenize and /detokenize endpoints #17857
transkatgirl
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Valid UTF-8 characters can span multiple tokens. If you are building a tool which uses llama-server to generate text one token at a time (see logitloom for what this might look like), you will frequently run into cases where the context that you would like to feed into the model is invalid UTF-8 (the most common case where this happens is when generating emoji token-by-token).
If your tool is only working with one model, you could mostly work around this by storing the token IDs along with their byte representations. However, this becomes impractical if you are working with multiple models at a time and allow the user to switch out models at any point during the session (see Tapestry Loom for what this can look like).
The best way to handle this would be to allow the llama-server /tokenize and /detokenize endpoints to work with a set of arbitrary bytes (represented in JSON as a list of numbers, similar to how logprobs are handled in the OpenAI Chat Completions API), which may or may not be valid UTF-8 depending on the model's outputs.
Beta Was this translation helpful? Give feedback.
All reactions