-
Notifications
You must be signed in to change notification settings - Fork 13k
Description
Name and Version
version: 5852 (6e2f8df3)
built with MSVC 19.44.35211.0 for Windows AMD64
Operating systems
Windows
GGML backends
SYCL
Hardware
Device: Intel(R) Arc(TM) Graphics
Processor: Intel(R) Core(TM) Ultra 5 125H, 4500 MHz, 14 cores, 18 threads
Models
OLMoE-1B-7B-0924.Q4_0.gguf
Qwen3-30B-A3B-128K-Q3_K_S.gguf
Problem description & steps to reproduce
Problem Description:
When running the MOE model on Intel(R) Arc(TM) Graphics (Intel Core Ultra 5 125H) using the following command:
build-x64-windows-sycl-release-f16\bin\llama-simple.exe -m "D:\Download\OLMoE-1B-7B-0924.Q4_0.gguf" -ngl 99
I encountered the following runtime error:
GGML_ASSERT(row_id_i >= 0 && row_id_i < n_as) failed
This error occurs during kernel execution and causes the program to crash.
Steps to Reproduce:
Run the following command with the MOE model:
build-x64-windows-sycl-release-f16\bin\llama-simple.exe -m "D:\Download\OLMoE-1B-7B-0924.Q4_0.gguf" -ngl 99 build-x64-windows-sycl-release-f16\bin\llama-simple.exe -m "D:\Download\Qwen3-30B-A3B-128K-Q3_K_S.gguf" -ngl 99
The program outputs the error:
llama_context: SYCL0 compute buffer size = 12.78 MiB
llama_context: SYCL_Host compute buffer size = 1.52 MiB
llama_context: graph nodes = 998
llama_context: graph splits = 2
Hello my name isD:\download\llama.cpp\ggml\src\ggml-sycl\ggml-sycl.cpp:3381: GGML_ASSERT(row_id_i >= 0 && row_id_i < n_as) failed
Proposed Solution:
After adding wait() to the kernel launch, the issue is resolved. However, I am unsure why this issue has not been encountered by others. This could potentially be a rare or hardware-specific issue, as I have not seen similar reports from the community. I suspect this might be an edge case, and the cause could be related to the asynchronous execution on iGPU, which may not be synchronized correctly in some specific scenarios.
static void argsort_f32_i32_sycl(const float *x, int *dst, const int ncols,
const int nrows, ggml_sort_order order,
queue_ptr stream) {
// bitonic sort requires ncols to be power of 2
const int ncols_pad = next_power_of_2(ncols);
const sycl::range<3> block_dims(1, 1, ncols_pad);
const sycl::range<3> block_nums(1, nrows, 1);
const size_t shared_mem = ncols_pad * sizeof(int);
if (order == GGML_SORT_ORDER_ASC) {
sycl_launch(stream, [&](sycl::handler & cgh) {
sycl::local_accessor<uint8_t, 1> dpct_local_acc_ct1(
sycl::range<1>(shared_mem), cgh);
sycl_parallel_for(
cgh, sycl::nd_range<3>(block_nums * block_dims, block_dims), [=](sycl::nd_item<3> item_ct1) {
k_argsort_f32_i32<GGML_SORT_ORDER_ASC>(
x, dst, ncols, ncols_pad, item_ct1,
dpct_local_acc_ct1.get_multi_ptr<sycl::access::decorated::no>()
.get());
});
});
} else if (order == GGML_SORT_ORDER_DESC) {
sycl_launch(stream, [&](sycl::handler & cgh) {
sycl::local_accessor<uint8_t, 1> dpct_local_acc_ct1(
sycl::range<1>(shared_mem), cgh);
sycl_parallel_for(
cgh, sycl::nd_range<3>(block_nums * block_dims, block_dims), [=](sycl::nd_item<3> item_ct1) {
k_argsort_f32_i32<GGML_SORT_ORDER_DESC>(
x, dst, ncols, ncols_pad, item_ct1,
dpct_local_acc_ct1.get_multi_ptr<sycl::access::decorated::no>()
.get());
});
});
} else {
GGML_ABORT("fatal error");
}
// Ensure all kernels finish execution before proceeding further
stream->wait(); // This line was added to synchronize and wait for kernel completion
}
First Bad Commit
No response
Relevant log output
Command:
build-x64-windows-sycl-release-f16\bin\llama-simple.exe -m "D:\Download\OLMoE-1B-7B-0924.Q4_0.gguf" -ngl 99
Log Output:
llama_context: SYCL0 compute buffer size = 12.78 MiB
llama_context: SYCL_Host compute buffer size = 1.52 MiB
llama_context: graph nodes = 998
llama_context: graph splits = 2
Hello my name is
D:\download\llama.cpp\ggml\src\ggml-sycl\ggml-sycl.cpp:3381: GGML_ASSERT(row_id_i >= 0 && row_id_i < n_as) failed