Support working with arbitrary bytes on the llama-server /tokenize and /detokenize endpoints #17857

transkatgirl · 2025-12-08T08:23:14Z

transkatgirl
Dec 8, 2025

Valid UTF-8 characters can span multiple tokens. If you are building a tool which uses llama-server to generate text one token at a time (see logitloom for what this might look like), you will frequently run into cases where the context that you would like to feed into the model is invalid UTF-8 (the most common case where this happens is when generating emoji token-by-token).

If your tool is only working with one model, you could mostly work around this by storing the token IDs along with their byte representations. However, this becomes impractical if you are working with multiple models at a time and allow the user to switch out models at any point during the session (see Tapestry Loom for what this can look like).

The best way to handle this would be to allow the llama-server /tokenize and /detokenize endpoints to work with a set of arbitrary bytes (represented in JSON as a list of numbers, similar to how logprobs are handled in the OpenAI Chat Completions API), which may or may not be valid UTF-8 depending on the model's outputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support working with arbitrary bytes on the llama-server /tokenize and /detokenize endpoints #17857

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Support working with arbitrary bytes on the llama-server /tokenize and /detokenize endpoints #17857

Uh oh!

Uh oh!

transkatgirl Dec 8, 2025

Replies: 0 comments

transkatgirl
Dec 8, 2025