Skip to content

Conversation

@0Marble
Copy link

@0Marble 0Marble commented Dec 3, 2025

Description

We implement the SSM_CONV operator using depthwise 1D convolution.
We use high-level builtin aclnnConvolution function.

The goal is to compute the following:

$$ y[i,j,k] = \sum_{l=0}^{dconv}w[l,i] x[l+j, i, k] $$

where the shape of $y$ is $[dinner, nt, ns]$, $x$ is $[dconv - 1 + nt, dinner, ns]$ and $w$ is $[dconv, dinner]$.

In order to use aclnnConvolution to implement this formula, we reshape the tensors and set the groups parameter to d_inner to calculate the convolution for each channel independently.

Testing

We ran test-backend-ops test suite for SSM_CONV on two different cards: 310P3 and 910B3.

34293ac6f3d37fd4488e48435b1853a3

a03110740632632a52e408b865c0a025

For the 310P3 card, it requires setting the cubeMathType parameter to ALLOW_FP32_DOWN_PRECISION, and it seems that causes the computation to be done not in f32, which in turn causes the tests to not pass with a small error (NMSE 0.000000114, greater than the allowed 1e-7). We had to override max_nmse_err() method for test_ssm_conv to set the maximum error to 1e-6 which allows the tests to pass.

On the 910B card, the operator runs in f32 natively, it passes the tests at the original 1e-7 precision.

Co-authored-by: Aleksei Lobanov, <[email protected]>
Co-authored-by: Sujin Kang, <[email protected]>
@0Marble 0Marble requested a review from ggerganov as a code owner December 3, 2025 12:22
@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Dec 3, 2025
// so the inputs are converted from f32
// and tests fail with NMSE = 0.000000114 > 0.000000100
double max_nmse_err() override {
return 1e-6;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not modify test cases other than those for ggml-cann, because the precision issues come from the 310p device’s own computation. We just need to be aware that the 310p will have some degree of precision loss.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any official way for the test case to know what backed is running? Doesn't seem like any other tests do any backend-specific stuff. I could override test_case.eval method on test_ssm_conv to save the backend, or make it so the max_err method takes optional backed parameters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think accuracy tests shouldn’t depend on which backend is used. If the required accuracy isn’t met, then it simply isn’t met — the standard should remain consistent across all backends.

// and tests fail with NMSE = 0.000000114 > 0.000000100
double max_nmse_err() override {
return 1e-6;
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed the custom error limits, now the test fails on 310P3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem.

Copy link
Collaborator

@noemotiovon noemotiovon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank you very much for your contribution — there’s just a small issue.
@hipudding, could you please help review it?

int64_t w_ne[GGML_MAX_DIMS] = { 0 };
size_t w_nb[GGML_MAX_DIMS] = { 0 };

w_ne[0] = nc; // K
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part can be merged into one line, but please keep the comments.

int64_t w_ne[GGML_MAX_DIMS] = { nc, 1, nr, 1 };                // [K, 1, C, 1]
size_t  w_nb[GGML_MAX_DIMS] = { src1->nb[0], src1->nb[1], src1->nb[1], src1->nb[3] };  // reuse src1 strides

int64_t y_ne[GGML_MAX_DIMS] = { 0 };
size_t y_nb[GGML_MAX_DIMS] = { 0 };

y_ne[0] = n_t; // L_out
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants