-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-15300: Add SSTableReader#getApproximatePositionsForRanges
#1993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Checklist before you submit for review
|
7429e16
to
33e456d
Compare
@blambov: I'd appreciate a look, at least at the first commit, as it involves trie stuff. This isn't very complex, but would like to make sure I didn't made an incorrect assumption in there (and/or missed a better way to do this). |
src/java/org/apache/cassandra/io/sstable/format/trieindex/TrieIndexSSTableReader.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/io/sstable/format/trieindex/TrieIndexSSTableReader.java
Outdated
Show resolved
Hide resolved
33e456d
to
272748a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good, but I am not familiar with trie code. I will leave it to @blambov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the rework, this is much easier to follow.
src/java/org/apache/cassandra/io/sstable/format/trieindex/TrieIndexSSTableReader.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/io/sstable/format/trieindex/TrieIndexSSTableReader.java
Outdated
Show resolved
Hide resolved
272748a
to
647368c
Compare
Note: cove coverage is not happy, and this seem to be mostly because I copied the |
In the context of CNDB-15300, adds a variant of the `SSTableReader#getPositionsForRanges` method that never read the data file to return its results, but in exchange may return positions that slightly "overshoot" the requested range. Put another way, the added method `SSTableReader#getApproximatePositionsForRanges` is such that if you call it on some range `R`, and you read the data within the returned positions, then the read data may start by one (at most) key (partition really) that sorts strictly before `R`, and may end by one (at most) key that stort strictly after `R`.
The STATS component deserialization method, `MetadataSerializer#deserialize` was creating a `RandomAccessReader` (RAR) to read the underlying file. But the deserialization does not do "random" accesses, it strictly deserialize sequentially. In principle, using a RAR to do sequential reads is fine (though slighly overkill), but it does mean that the method used on the underlying `FileChannel` will be an "absolute" read (that take its position as argument, instead of reading at the channel position), and tiered-storage extensions with custom file channel may be able to use simple/more optimal implementations when then know the file is read sequentially (only through "relative" read calls). Tl;dr, this replace the use of RAR by `FileInputStreamPlus`, which is essentially equivalent in this use case, but does only do relative reads.
Cove coverage complains it isn't tested, it is a bit involved to actually test, and we actually don't need (at least not yet) this method in those cases. In other words, this was kind of dead code, so removing with assertions to prevent future misuse.
6135469
to
1cee610
Compare
|
❌ Build ds-cassandra-pr-gate/PR-1993 rejected by Butler2 regressions found Found 2 new test failures
Found 2 known test failures |
…Ranges` (#1993) This PR is in the context of riptano/cndb#15380, and is used by its PR riptano/cndb#15380. It adds a variant of the `SSTableReader#getPositionsForRanges` method that never read the data file to return its results, but in exchange may return positions that slightly "overshoot" the requested range. Put another way, the added method `SSTableReader#getApproximatePositionsForRanges` is such that if you call it on some range `R`, and you read the data within the returned positions, then the read data may start by one (at most) key (partition really) that sorts strictly before `R`, and may end by one (at most) key that sorts strictly after `R`. Additionally, the PR switches the reading of the `Statistics.db` component from using `RandomAccessReader` to using `FileInputStreamPlus`. This is essentially equivalent functionality wise (since the component is deserialized sequentially anyway, there is no random reads), but by making it more "clear" that it doesn't do random reads, it allows us to "direct download" this component like other related components on the CNDB side. See the last point of riptano/cndb#15380 for more details.
…Ranges` (#1993) This PR is in the context of riptano/cndb#15380, and is used by its PR riptano/cndb#15380. It adds a variant of the `SSTableReader#getPositionsForRanges` method that never read the data file to return its results, but in exchange may return positions that slightly "overshoot" the requested range. Put another way, the added method `SSTableReader#getApproximatePositionsForRanges` is such that if you call it on some range `R`, and you read the data within the returned positions, then the read data may start by one (at most) key (partition really) that sorts strictly before `R`, and may end by one (at most) key that sorts strictly after `R`. Additionally, the PR switches the reading of the `Statistics.db` component from using `RandomAccessReader` to using `FileInputStreamPlus`. This is essentially equivalent functionality wise (since the component is deserialized sequentially anyway, there is no random reads), but by making it more "clear" that it doesn't do random reads, it allows us to "direct download" this component like other related components on the CNDB side. See the last point of riptano/cndb#15380 for more details.
…Ranges` (#1993) This PR is in the context of riptano/cndb#15380, and is used by its PR riptano/cndb#15380. It adds a variant of the `SSTableReader#getPositionsForRanges` method that never read the data file to return its results, but in exchange may return positions that slightly "overshoot" the requested range. Put another way, the added method `SSTableReader#getApproximatePositionsForRanges` is such that if you call it on some range `R`, and you read the data within the returned positions, then the read data may start by one (at most) key (partition really) that sorts strictly before `R`, and may end by one (at most) key that sorts strictly after `R`. Additionally, the PR switches the reading of the `Statistics.db` component from using `RandomAccessReader` to using `FileInputStreamPlus`. This is essentially equivalent functionality wise (since the component is deserialized sequentially anyway, there is no random reads), but by making it more "clear" that it doesn't do random reads, it allows us to "direct download" this component like other related components on the CNDB side. See the last point of riptano/cndb#15380 for more details.
This PR is in the context of https://github.com/riptano/cndb/pull/15380, and is used by its PR https://github.com/riptano/cndb/pull/15380.
It adds a variant of the
SSTableReader#getPositionsForRanges
method that never read the data file to return its results, but in exchange may return positions that slightly "overshoot" the requested range.Put another way, the added method
SSTableReader#getApproximatePositionsForRanges
is such that if you call it on some rangeR
, and you read the data within the returned positions, then the read data may start by one (at most) key (partition really) that sorts strictly beforeR
, and may end by one (at most) key that sorts strictly afterR
.Additionally, the PR switches the reading of the
Statistics.db
component from usingRandomAccessReader
to usingFileInputStreamPlus
. This is essentially equivalent functionality wise (since the component is deserialized sequentially anyway, there is no random reads), but by making it more "clear" that it doesn't do random reads, it allows us to "direct download" this component like other related components on the CNDB side. See the last point of https://github.com/riptano/cndb/pull/15380 for more details.