Skip to content

Conversation

huanglangwen
Copy link

No description provided.

Copy link
Owner

@juntyr juntyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting the PR @huanglangwen! I've left some initial comments

@@ -0,0 +1,100 @@
[package]
name = "numcodecs-ebcc"
version = "1.0.0"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am slightly concerned about the 1.0.0 version for the crate. numcodecs still undergoes breaking changes from time to time, which then require major version releases. While numcodecs still has a 0.x version, I'd prefer to also keep the codecs on 0.x versions.

The codecs already separate out the version of the compression algorithm, which is specified in the config, to allow the implementation of the wrapper to change while keeping the algorithm on the same version.

@@ -0,0 +1,217 @@
# EBCC Rust Bindings

This directory contains Rust bindings for EBCC (Error Bounded Climate Compressor), providing a safe and efficient interface to the EBCC compression library with integration support for the `numcodecs` ecosystem.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repository should likely only contain the numcodecs wrapper for the bindings since you have also developed alternative frontends. For now, I would likely suggest to remove all code that isn't needed for the numcodecs frontend. If/once you publish the Rust wrapper on crates.io, the numcodecs-ebcc crate could then depend on that instead of vendoring ebcc (but for now I'm fine with vendoring)

/// data.len() * 4, compressed.len());
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn encode_climate_variable(data: &[f32], config: &EBCCConfig) -> EBCCResult<Vec<u8>> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a broader function name?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would it ever be possible to support data types other than f32?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer passing data in ndarray form so that data and shape are coupled, I'll mention this more in a later comment

/// ```
pub fn encode_climate_variable(data: &[f32], config: &EBCCConfig) -> EBCCResult<Vec<u8>> {
// Validate configuration
config.validate()?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to validate as much of the config before the data is known (where possible, e.g. if a number must be positive etc) since error reporting is usually more user friendly the earlier it happens. Since the config is not validated based on the data here, I would prefer guaranteeing that a EBCCConfig can only contain valid configs, i.e. to validate upon construction

let mut out_buffer: *mut f32 = ptr::null_mut();
let decompressed_size = unsafe {
ffi::decode_climate_variable(
compressed_data.as_ptr() as *mut u8, // C function shouldn't modify input
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as casts are generally discouraged since they (can) modify too many aspects of the type at once - here where we only modify the mutability but not the element type, cast_mut should be used instead

pub dims: [usize; NDIMS],

/// Base compression ratio for JPEG2000 layer
pub base_cr: f32,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cr must be >0, so you could copy a positive type from the other codecs

pub residual_compression_type: ResidualType,

/// Maximum allowed error (used with MaxError and RelativeError)
pub error: f32,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this value only makes sense with max error and relative error (and has different meanings for them), it should be included in the ResidualType enum

#[derive(JsonSchema, Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct EBCCConfig {
/// Data dimensions [frames, height, width] - must be exactly 3 dimensions
pub dims: [usize; NDIMS],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the dimensions in the config is a bit clunky (and unfortunately incompatible with the benchmark, where we configure the compressor once but then might apply it over differently-sizes chunks of the data). Since ebcc presumably does not store the data size itself, you could copy one of the headers that I created for other codecs to store the data size. This does mean that the output is not compatible with other tools (though the header could always be skipped by them) but makes the codec more flexible.

pub config: EBCCConfig,
/// The codec's encoding format version. Do not provide this parameter explicitly.
#[serde(default, rename = "_version")]
pub version: String,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version uses a different type in the other codecs to ensure that you can only construct a codec with the current version, and that reconstructing a codec from an existing config only succeeds if the version if semver compatible

/// Ok(())
/// }
/// ```
pub fn ebcc_codec_from_config(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can usually just deserialize without having to write this explicit method. If you'd like to keep the support for the deprecated parameters, you could create empty/uninhabited enum types for them, manually implement deserialize for those and error if the code path is reached. Since uninhabited types cannot be constructed in Rust, no Rust user could create a config with them, and deserialsing them would fail with a nice deprecation error. Though this is also come cleanup I could do.

@huanglangwen
Copy link
Author

Thanks a lot! I'll overhaul it over the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants