Skip to content

Eval bug: Asynchronous Kernel Execution on iGPU Causes Runtime Errors with MOE Model #15580

@jiujiuwei

Description

@jiujiuwei

Name and Version

version: 5852 (6e2f8df3)
built with MSVC 19.44.35211.0 for Windows AMD64

Operating systems

Windows

GGML backends

SYCL

Hardware

Device: Intel(R) Arc(TM) Graphics
Processor: Intel(R) Core(TM) Ultra 5 125H, 4500 MHz, 14 cores, 18 threads

Models

OLMoE-1B-7B-0924.Q4_0.gguf
Qwen3-30B-A3B-128K-Q3_K_S.gguf

Problem description & steps to reproduce

Problem Description:
When running the MOE model on Intel(R) Arc(TM) Graphics (Intel Core Ultra 5 125H) using the following command:
build-x64-windows-sycl-release-f16\bin\llama-simple.exe -m "D:\Download\OLMoE-1B-7B-0924.Q4_0.gguf" -ngl 99
I encountered the following runtime error:
GGML_ASSERT(row_id_i >= 0 && row_id_i < n_as) failed
This error occurs during kernel execution and causes the program to crash.
Steps to Reproduce:
Run the following command with the MOE model:
build-x64-windows-sycl-release-f16\bin\llama-simple.exe -m "D:\Download\OLMoE-1B-7B-0924.Q4_0.gguf" -ngl 99 build-x64-windows-sycl-release-f16\bin\llama-simple.exe -m "D:\Download\Qwen3-30B-A3B-128K-Q3_K_S.gguf" -ngl 99
The program outputs the error:

llama_context:      SYCL0 compute buffer size =    12.78 MiB
llama_context:  SYCL_Host compute buffer size =     1.52 MiB
llama_context: graph nodes  = 998
llama_context: graph splits = 2
Hello my name isD:\download\llama.cpp\ggml\src\ggml-sycl\ggml-sycl.cpp:3381: GGML_ASSERT(row_id_i >= 0 && row_id_i < n_as) failed

Proposed Solution:
After adding wait() to the kernel launch, the issue is resolved. However, I am unsure why this issue has not been encountered by others. This could potentially be a rare or hardware-specific issue, as I have not seen similar reports from the community. I suspect this might be an edge case, and the cause could be related to the asynchronous execution on iGPU, which may not be synchronized correctly in some specific scenarios.

static void argsort_f32_i32_sycl(const float *x, int *dst, const int ncols,
                                 const int nrows, ggml_sort_order order,
                                 queue_ptr stream) {
    // bitonic sort requires ncols to be power of 2
    const int ncols_pad = next_power_of_2(ncols);

    const sycl::range<3> block_dims(1, 1, ncols_pad);
    const sycl::range<3> block_nums(1, nrows, 1);
    const size_t shared_mem = ncols_pad * sizeof(int);

    if (order == GGML_SORT_ORDER_ASC) {
        sycl_launch(stream, [&](sycl::handler & cgh) {
            sycl::local_accessor<uint8_t, 1> dpct_local_acc_ct1(
                sycl::range<1>(shared_mem), cgh);

            sycl_parallel_for(
                cgh, sycl::nd_range<3>(block_nums * block_dims, block_dims), [=](sycl::nd_item<3> item_ct1) {
                    k_argsort_f32_i32<GGML_SORT_ORDER_ASC>(
                        x, dst, ncols, ncols_pad, item_ct1,
                        dpct_local_acc_ct1.get_multi_ptr<sycl::access::decorated::no>()
                            .get());
                });
        });
    } else if (order == GGML_SORT_ORDER_DESC) {
        sycl_launch(stream, [&](sycl::handler & cgh) {
            sycl::local_accessor<uint8_t, 1> dpct_local_acc_ct1(
                sycl::range<1>(shared_mem), cgh);

            sycl_parallel_for(
                cgh, sycl::nd_range<3>(block_nums * block_dims, block_dims), [=](sycl::nd_item<3> item_ct1) {
                    k_argsort_f32_i32<GGML_SORT_ORDER_DESC>(
                        x, dst, ncols, ncols_pad, item_ct1,
                        dpct_local_acc_ct1.get_multi_ptr<sycl::access::decorated::no>()
                            .get());
                });
        });
    } else {
        GGML_ABORT("fatal error");
    }

    // Ensure all kernels finish execution before proceeding further
    stream->wait();  // This line was added to synchronize and wait for kernel completion
}

First Bad Commit

No response

Relevant log output

Command:
build-x64-windows-sycl-release-f16\bin\llama-simple.exe -m "D:\Download\OLMoE-1B-7B-0924.Q4_0.gguf" -ngl 99

Log Output:
llama_context: SYCL0 compute buffer size = 12.78 MiB
llama_context: SYCL_Host compute buffer size = 1.52 MiB
llama_context: graph nodes = 998
llama_context: graph splits = 2
Hello my name is
D:\download\llama.cpp\ggml\src\ggml-sycl\ggml-sycl.cpp:3381: GGML_ASSERT(row_id_i >= 0 && row_id_i < n_as) failed

Metadata

Metadata

Assignees

No one assigned

    Labels

    SYCLhttps://en.wikipedia.org/wiki/SYCL - GPU programming languagebug-unconfirmed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions