-
Notifications
You must be signed in to change notification settings - Fork 661
[XPU]add enable_logprob #5279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XPU]add enable_logprob #5279
Conversation
|
Thanks for your contribution! |
|
root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5279 +/- ##
==========================================
Coverage ? 59.60%
==========================================
Files ? 324
Lines ? 39711
Branches ? 5976
==========================================
Hits ? 23669
Misses ? 14158
Partials ? 1884
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
gongshaotian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
|
||
| PD_BUILD_STATIC_OP(get_output_topk) | ||
| .Inputs({"x", "scores", "ranks"}) | ||
| .Attrs({"k: int", "rank_id: int64_t", "wait_flag: bool"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是最大不超过上面定义的 K 5个嘛?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XPU之前内部版的算子默认K=5 BS=128, 这里直接把算子迁移过来了,所以配置上没完全和GPU对齐。我们可以改改自定义算子,把这些配置和GPU对齐看看有没有问题
5628413
jeff41404
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XPU存量自定义算子迁移,短期先豁免
hong19860320
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
qingqing01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续需要支持 zmq
Motivation
This PR primarily adds Logprobs support for XPU (Kunlun Chip) on the FastDeploy LLM inference engine.
Previously, Logprobs functionality was restricted only to CUDA platforms, which prevented users from leveraging advanced sampling features on XPU devices.
Modifications
This PR involves changes across configuration, worker logic, and the custom XPU operators
Usage or Command
python -m fastdeploy.entrypoints.openai.api_server
--model /work/PaddlePaddle/ERNIE-4.5-0.3B-Paddle
--port 8188
--tensor-parallel-size 1
--max-model-len 32768
--max-num-seqs 128
--quantization "wint8"
--gpu-memory-utilization 0.9
--enable-logprob
Accuracy Tests
This change affects the Logprobs output structure and platform support, not the core inference results.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.