You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+35-15Lines changed: 35 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,13 +11,13 @@
11
11
12
12
13
13
The C#/.NET binding of [llama.cpp](https://github.com/ggerganov/llama.cpp). It provides APIs to inference the LLaMa Models and deploy it on native environment or Web. It works on
14
-
both Windows and Linux and does NOT require compiling llama.cpp yourself.
14
+
both Windows and Linux and does NOT require compiling llama.cpp yourself. Its performance is close to llama.cpp.
15
15
16
-
- Load and inference LLaMa models
17
-
- Simple APIs for chat session
18
-
- Quantize the model in C#/.NET
16
+
- LLaMa models inference
17
+
- APIs for chat session
18
+
- Model quantization
19
+
- Embedding generation, tokenization and detokenization
19
20
- ASP.NET core integration
20
-
- Native UI integration
21
21
22
22
## Installation
23
23
@@ -35,18 +35,23 @@ LLamaSharp.Backend.Cuda11
35
35
LLamaSharp.Backend.Cuda12
36
36
```
37
37
38
-
The latest version of `LLamaSharp` and `LLamaSharp.Backend` may not always be the same. `LLamaSharp.Backend` follows up [llama.cpp](https://github.com/ggerganov/llama.cpp) because sometimes the
39
-
break change of it makes some model weights invalid. If you are not sure which version of backend to install, just install the latest version.
38
+
Here's the mapping of them and corresponding model samples provided by `LLamaSharp`. If you're not sure which model is available for a version, please try our sample model.
40
39
41
-
Note that version v0.2.1 has a package named `LLamaSharp.Cpu`. After v0.2.2 it will be dropped.
40
+
| LLamaSharp.Backend | LLamaSharp | Verified Model Resources | llama.cpp commit id |
41
+
| - | - | -- | - |
42
+
| - | v0.2.0 | This version is not recommended to use. | - |
We publish the backend with cpu, cuda11 and cuda12 because they are the most popular ones. If none of them matches, please compile the [llama.cpp](https://github.com/ggerganov/llama.cpp)
44
48
from source and put the `libllama` under your project's output path. When building from source, please add `-DBUILD_SHARED_LIBS=ON` to enable the library generation.
45
49
46
50
## FAQ
47
51
48
-
1. GPU out of memory: v0.2.3 put all layers into GPU by default. If the momory use is out of the capacity of your GPU, please set `n_gpu_layers` to a smaller number.
49
-
2. Unsupported model: `llama.cpp` is under quick development and often has break changes. Please check the release date of the model and find a suitable version of LLamaSharp to install.
52
+
1. GPU out of memory: Please try setting `n_gpu_layers` to a smaller number.
53
+
2. Unsupported model: `llama.cpp` is under quick development and often has break changes. Please check the release date of the model and find a suitable version of LLamaSharp to install, or use the model we provide [on huggingface](https://huggingface.co/AsakusaRinne/LLamaSharpSamples).
54
+
50
55
51
56
## Simple Benchmark
52
57
@@ -112,30 +117,35 @@ For more usages, please refer to [Examples](./LLama.Examples).
112
117
113
118
We provide the integration of ASP.NET core [here](./LLama.WebAPI). Since currently the API is not stable, please clone the repo and use it. In the future we'll publish it on NuGet.
114
119
120
+
Since we are in short of hands, if you're familiar with ASP.NET core, we'll appreciate it if you would like to help upgrading the Web API integration.
121
+
115
122
## Demo
116
123
117
124

118
125
119
126
## Roadmap
120
127
121
-
✅ LLaMa model inference.
128
+
✅ LLaMa model inference
122
129
123
-
✅ Embeddings generation.
130
+
✅ Embeddings generation, tokenization and detokenization
124
131
125
-
✅ Chat session.
132
+
✅ Chat session
126
133
127
134
✅ Quantization
128
135
136
+
✅ State saving and loading
137
+
129
138
✅ ASP.NET core Integration
130
139
131
-
🔳 UI Integration
140
+
🔳 MAUI Integration
132
141
133
142
🔳 Follow up llama.cpp and improve performance
134
143
135
144
## Assets
136
145
137
-
The model weights are too large to be included in the repository. However some resources could be found below:
146
+
Some extra model resources could be found below:
138
147
148
+
-[Qunatized models provided by LLamaSharp Authors](https://huggingface.co/AsakusaRinne/LLamaSharpSamples)
0 commit comments