Skip to content

BlackrockRawIO is over-segmenting data #1770

@h-mayorquin

Description

@h-mayorquin

I have user data that, according to them, has no true segments at all. They did not stop and restart the recording session. Nevertheless, the current BlackrockRawIO implementation creates thousands of segments.

The reason is the following logic in the code:

_period = self._nsx_basic_header[nsx_nb]["period"] # 30_000 ^-1 s per sample
_nominal_rate = 30_000 / _period # samples per sec; maybe 30_000 should be ["sample_resolution"]
_clock_rate = self._nsx_basic_header[nsx_nb]["timestamp_resolution"] # clocks per sec
clk_per_samp = _clock_rate / _nominal_rate # clk/sec / smp/sec = clk/smp
seg_thresh_clk = int(2 * clk_per_samp)
seg_starts = np.hstack((0, 1 + np.argwhere(np.diff(struct_arr["timestamps"]) > seg_thresh_clk).flatten()))
for seg_ix, seg_start_idx in enumerate(seg_starts):

Here, if the difference in timestamps is larger than twice the expected difference based on the sampling rate of the stream, a new segment is created. For the user’s ns4 and ns6 files, these thresholds are as small as 0.2 ms and 0.067 ms, which is overly strict. This causes recordings with tiny millisecond gaps (Which I think are buffer and/or jitter artifacts, not real breaks) to be split into even though the user insists the recording was continuous.

The Matlab NPMK implementation lets users control this threshold with the max_tick_multiple parameter:

https://github.com/BlackrockNeurotech/NPMK/blob/a5b3e3b25b6e2f4594ecbb99d3e0e5e517530959/NPMK/openNSx.m#L182-L186

My suggestion is to improve this in three ways:

  • Provide warnings when gaps are found, including their size and type, similar to how Intan handling works. See Blackrock add summary of automatic data segmentation  #1769
  • Add a parameter (e.g. segmentation_threshold``segment_threshold_s) to the constructor so users can control how large a gap must be before a new segment is created.
  • Give users direct access to the raw timestamps so they can analyze the gaps themselves, perform custom interpolation or drift correction, and align with other systems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions