Add support for reasoning models #1711

RobinPicard · 2025-08-06T11:56:20Z

Expose 2 new keywords for generation:

end_thinking_tag: a string indicating the tag used by the reasoning model to indicate that thinking is finished (and so that we should start constraining the generation)
thinking_max_tokens: an int giving the maximum number of tokens during which the model can think, after that number is reached, we force the generation of the end of thinking token

Not supported:

Models for which the end of the thinking does not correspond to a single token

If we want to capture the content of the thinking in the future when we will return an object with various attributes instead of just the text output, we could add an argument start_thinking_tag for the models that use one.

rlouf · 2025-08-06T14:52:21Z

outlines/backends/__init__.py

+        The tag the model uses to indicate the end of the thinking process.
+        Only used when running a thinking model.
+    thinking_max_tokens: int | None
+        The maximum number of tokens the model can think about. Only used when


Suggested change

The maximum number of tokens the model can think about. Only used when

The maximum number of tokens the model can think for. Only used when

rlouf · 2025-08-06T14:55:32Z

outlines/backends/llguidance.py

+        end_thinking_token_id: int | None
+            The id of the end thinking token
+        thinking_max_tokens: int | None
+            The maximum number of tokens the model can think about


Isn't it possible to only build a specialized logits processor that the backends are unaware of? You should be able to not call the logits biasing function as long as </think> has not been generated, and limit the number of tokens from the logits processor.

Initially I wanted to wrap the logits processor into another one that would just not bias anything until we encounter the token and then it calls the other tokenizer it wraps, the problem is that it does not work for batching as the different sequences may not all stop thinking at the same time.

Add support for reasoning models

83129cc

rlouf reviewed Aug 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for reasoning models #1711

Add support for reasoning models #1711

Uh oh!

RobinPicard commented Aug 6, 2025 •

edited

Loading

Uh oh!

rlouf Aug 6, 2025

Uh oh!

rlouf Aug 6, 2025

Uh oh!

RobinPicard Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	The maximum number of tokens the model can think about. Only used when
	The maximum number of tokens the model can think for. Only used when

Add support for reasoning models #1711

Are you sure you want to change the base?

Add support for reasoning models #1711

Uh oh!

Conversation

RobinPicard commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rlouf Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

rlouf Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

RobinPicard Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RobinPicard commented Aug 6, 2025 •

edited

Loading