Skip to content

intrinsic-test: Final code cleanup for the arm and common module #1884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

madhav-madhusoodanan
Copy link
Contributor

@madhav-madhusoodanan madhav-madhusoodanan commented Jul 24, 2025

Summary

  1. Changed from IntrinsicType::target (String) to IntrinsicType::metadata (HashMap<String, String>) for better support for differing architectures
  2. Added Constraint::Set(Vec<i64>) for support for distinct constant argument values (which may be of types similar to the IMM type)

Context

This PR is part 3 of the changes that were originally made in the x86 extension PR #1814 for intrinsic-test, and is intended to accomodate the changes made in PRs #1862 and #1863.

cc: @Amanieu @folkertdev

@madhav-madhusoodanan
Copy link
Contributor Author

I'm coming across a weird compilation issue on the CI:

clang++: warning: argument unused during compilation: '-fuse-ld=lld' [-Wunused-command-line-argument]
mod_3.cpp:138:43: error: use of undeclared identifier 'vld1q_f16'

Copy link
Contributor

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll need some work, and may benefit from splitting it up even further.

I'm not convinced about metadata being a hashmap. What do you actually need to store in there? (I'm really trying to avoid generalizing this codebase before there is a concrete need).


re

clang++: warning: argument unused during compilation: '-fuse-ld=lld' [-Wunused-command-line-argument]
mod_3.cpp:138:43: error: use of undeclared identifier 'vld1q_f16'

the first part you can ignore, the second part is due to an expect that this PR introduces, and it is emitted so often that CI gives up.

Comment on lines +1 to +12
use crate::arm::intrinsic::ArmIntrinsicType;
use crate::common::argument::Argument;

impl Argument<ArmIntrinsicType> {
pub fn type_and_name_from_c(arg: &str) -> (&str, &str) {
let split_index = arg
.rfind([' ', '*'])
.expect("Couldn't split type and argname");

(arg[..split_index + 1].trim_end(), &arg[split_index + 1..])
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this really need its own file?

.add_extra_flags(vec!["-ffp-contract=off", "-Wno-narrowing"]);
.add_extra_flags(vec!["-ffp-contract=off", "-Wno-narrowing", "-mfpu=neon"]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debugging the cause of the issue error: use of undeclared identifier 'vld1q_f16' on the CI tool.
This specific change was only experimental.

Comment on lines -71 to +72
let cpp_compiler = compile::build_cpp_compilation(&self.cli_options).unwrap();
let cpp_compiler_wrapped = compile::build_cpp_compilation(&self.cli_options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we really should just panic if there is no compiler configured. What are we supposed to do in that scenario? We should not fail silently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be

let Some(cpp_compiler) = compile::build_cpp_compilation(&self.cli_options) else { 
    panic!("no C compiler configured");
}

I guess.

Copy link
Contributor Author

@madhav-madhusoodanan madhav-madhusoodanan Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a CLI option to only generate the C and Rust test files (without building or linking).
This is useful if I only need to get the test files without needing to setup the compilation or linking toolchains.

The relevant flag is --generate-only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok, can you add a comment mentioning that flag?

(I think we should clean that up with a custom enum Mode { Compile(PathBuf), GenerateOnly } at some point)

Comment on lines -79 to +81
let mut file = File::create(&c_filename).unwrap();
let mut file = fs::File::create(&c_filename).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just keep this as File, you can use use std::fs::{self, File} to import file and the fs module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.
I made this change, since I thought importing {self, File} would look awkward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit weird at first but we do this all the time e.g. in the compiler.

Comment on lines +91 to +92
let ty = ArmIntrinsicType::from_c(type_name)
.unwrap_or_else(|_| panic!("Failed to parse argument '{arg}'"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this line fails a bunch in CI, so the logic here is incorrect for this target.

Comment on lines -11 to +16
pub fn to_range(&self) -> Range<i64> {
pub fn to_vector(&self) -> Vec<i64> {
match self {
Constraint::Equal(eq) => *eq..*eq + 1,
Constraint::Range(range) => range.clone(),
Constraint::Equal(eq) => vec![*eq],
Constraint::Range(range) => range.clone().collect::<Vec<i64>>(),
Constraint::Set(values) => values.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use this instead to avoid clones/allocations:

impl Constraint {
    /// Iterate over the values of this constraint.
    fn iter<'a>(&'a self) -> impl Iterator<Item = i64> + 'a {
        match self {
            Constraint::Equal(i) => std::slice::Iter::default().copied().chain(*i..*i + 1),
            Constraint::Range(range) => std::slice::Iter::default().copied().chain(range.clone()),
            Constraint::Set(items) => items.iter().copied().chain(std::ops::Range::default()),
        }
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not super happy with the name "constraint" (probably made sense at the time, but now?) but we can resolve that later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this actually is present to support IMM type values, since Rust uses const generics to pass the value.

Example:

#[stable(feature = "simd_x86", since = "1.27.0")]
pub fn _mm256_blend_ps<const IMM8: i32>(a: __m256, b: __m256) -> __m256

Comment on lines -124 to +125
pub target: String,
pub metadata: HashMap<String, String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a hashmap? Couldn't this be a struct instead? what values do you actually need to store in here for x86?

@@ -5,13 +5,15 @@ use std::ops::Range;
pub enum Constraint {
Equal(i64),
Range(Range<i64>),
Set(Vec<i64>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example here of a set that is not a range (but that will actually come up)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example X86 intrinsic: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_i32gather_pd&ig_expand=3720

The function definition is:

__m128d _mm_i32gather_pd (double const* base_addr, __m128i vindex, const int scale)

and scale can only take the values 1, 2, 4, and 8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an x86 related change though, would it be recommended to keep it in the PR that will add x86 to the behavioural testing test suite?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, can you change this type to something like:

/// Describes the values to test for a const generic parameter.
#[derive(Debug, PartialEq, Clone, Deserialize)]
pub enum Constraint {
    /// Test a single value.
    Equal(i64),
    /// Test a range of values, e.g. `0..16`.
    Range(Range<i64>),
    /// Test some specific values, e.g. `vec![1, 2, 4, 8]`.
    Set(Vec<i64>),
}

We might make Equal just a singleton set at some point too, maybe.

@madhav-madhusoodanan madhav-madhusoodanan force-pushed the intrinsic-test-intrinsictype-metadata-migration branch 2 times, most recently from 348326c to e1a7243 Compare July 25, 2025 11:08
@madhav-madhusoodanan madhav-madhusoodanan force-pushed the intrinsic-test-intrinsictype-metadata-migration branch from e1a7243 to 58f200e Compare July 25, 2025 11:15
@madhav-madhusoodanan
Copy link
Contributor Author

madhav-madhusoodanan commented Jul 25, 2025

I was debugging this issue: mod_3.cpp:138:43: error: use of undeclared identifier 'vld1q_f16'.

The command that was being run was:
clang++ -march=armv8.6-a+crypto+crc+dotprod+fp16 -O2 -ffp-contract=off -Wno-narrowing -mfpu=neon --target=armv7-unknown-linux-gnueabihf mod_3.cpp -c -o mod_3.o.

I suspected that armv8.6-a and --target=armv7-unknown-linux-gnueabihf might be clashing (armv8-a vs armv7), so i removed the -march tag itself.

This led me to the following error (showing a sample, there were more intrinsics):

mod_3.cpp:861:43: error: use of undeclared identifier 'vld1_f16'
        float16x4_t a = cast<float16x4_t>(vld1_f16(&a_vals[i]));
                                          ^
mod_3.cpp:862:43: error: use of undeclared identifier 'vld1_f16'
        float16x4_t b = cast<float16x4_t>(vld1_f16(&b_vals[i]));

The intrinsic vld1_f16 is present on v7, A64 and A32 architectures, so not like we can add it (and the others) to the exclusion list:

  {
    "SIMD_ISA": "Neon",
    "name": "vld1_f16",
    "arguments": [
      "float16_t const * ptr"
    ],
    "return_type": {
      "value": "float16x4_t"
    },
    "Arguments_Preparation": {
      "ptr": {
        "register": "Xn"
      }
    },
    "Architectures": [
      "v7",
      "A32",
      "A64"
    ],
    "instructions": [
      [
        "LD1"
      ]
    ]
  },

@folkertdev
Copy link
Contributor

Can you isolate the issue into a godbolt.org snippet? I vaguely remember there is some full-fp16 flag though I can't find much about it now.

@madhav-madhusoodanan
Copy link
Contributor Author

madhav-madhusoodanan commented Jul 25, 2025

@folkertdev Here is a minimal reproduction: https://godbolt.org/z/e1r3x9xfo

@folkertdev
Copy link
Contributor

using -fpu=neon-fp16 helps a bit.

also I don't really see why these changes cause this issue? was this code not generated before?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants