You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/manuals/ai/compose/model-runner.md
+27-6Lines changed: 27 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -102,15 +102,36 @@ services:
102
102
type: model
103
103
options:
104
104
model: ai/smollm2
105
+
context-size: 1024
106
+
runtime-flags: "--no-prefill-assistant"
105
107
```
106
108
107
-
Notice the dedicated `provider` attribute in the `ai_runner` service.
108
-
This attribute specifies that the service is a model provider and lets you define options such as the name of the model to be used.
109
+
Notice the following:
109
110
110
-
There is also a `depends_on` attribute in the `my-chat-app` service.
111
-
This attribute specifies that the `my-chat-app` service depends on the `ai_runner` service.
112
-
This means that the `ai_runner` service will be started before the `my-chat-app` service to allow injection of model information to the `my-chat-app` service.
111
+
In the `ai_runner` service:
113
112
114
-
## Reference
113
+
- `provider.type`: Specifies that the service is a `model` provider.
114
+
- `provider.options`: Specifies the options of the mode:
115
+
116
+
- We want to use `ai/smollm2` model.
117
+
118
+
- We set the context size to `1024` tokens.
119
+
120
+
121
+
> [!NOTE]
122
+
> Each model has its own maximum context size. When increasing the context length,
123
+
> consider your hardware constraints. In general, try to use the smallest context size
124
+
> possible for your use case.
125
+
126
+
- We pass the llama.cpp server `--no-prefill-assistant` parameter,
127
+
see [the available parameters](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md).
128
+
129
+
In the `chat` service:
130
+
131
+
- `depends_on`specifies that the `chat` service depends on the `ai_runner` service. The
132
+
`ai_runner`service will be started before the `chat` service, to allow injection of model
133
+
information to the `chat` service.
134
+
135
+
## Related pages
115
136
116
137
- [Docker Model Runner documentation](/manuals/ai/model-runner.md)
0 commit comments