-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-15280: Fix AbstractReadQuery.toCQLString to produce correct CQL when possible #1985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Checklist before you submit for review
|
ef1ea2c to
3073576
Compare
3073576 to
8441449
Compare
a84996a to
f9d1f3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
The PR "CNDB-15280: Remove user data from AbstractReadQuery.toCQLString" modifies the CQL string generation methods to redact sensitive user data from query logs. The main purpose is to prevent sensitive column values from appearing in slow query logs and other monitoring outputs by replacing them with "?" placeholders.
Key changes:
- Modified
AbstractReadQuery.toCQLStringto accept aredactparameter for controlling value redaction - Added redaction capabilities throughout the CQL string generation chain
- Updated SAI query plan logging to use redacted strings
Reviewed Changes
Copilot reviewed 36 out of 36 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
test/unit/org/apache/cassandra/index/sai/plan/PlanTest.java |
Updated test mocks to support new redaction parameter and added tests for both redacted and non-redacted plan output |
test/unit/org/apache/cassandra/db/filter/DataLimitsTest.java |
Fixed test assertions to use "LIMIT" instead of "ROWS LIMIT" for CQL compatibility |
test/unit/org/apache/cassandra/db/filter/ColumnFilterTest.java |
Updated all test assertions to test both redacted and non-redacted CQL string generation |
test/unit/org/apache/cassandra/db/SinglePartitionReadCommandCQLTest.java |
Expanded test coverage to verify redaction works for various query types and added tests for quoted identifiers |
test/unit/org/apache/cassandra/db/ReadCommandCQLTester.java |
Modified helper method to test both redacted and non-redacted CQL output |
test/unit/org/apache/cassandra/db/PartitionRangeReadCommandCQLTest.java |
Added comprehensive tests for redaction across different query patterns including index-based queries |
test/unit/org/apache/cassandra/db/MultiRangeReadCommandCQLTest.java |
New test file for multi-range read command CQL string generation with redaction support |
test/unit/org/apache/cassandra/cql3/validation/entities/SecondaryIndexTest.java |
Fixed error message templates to remove placeholder syntax |
test/distributed/org/apache/cassandra/distributed/test/SlowQueryLoggerTest.java |
New integration test verifying that slow query logger redacts sensitive data |
src/java/org/apache/cassandra/schema/ColumnMetadata.java |
Fixed CQL string generation to use proper column name quoting |
src/java/org/apache/cassandra/index/sai/plan/QueryController.java |
Updated tracing to use redacted plan strings |
src/java/org/apache/cassandra/index/sai/plan/Plan.java |
Added redaction support to plan string generation methods |
src/java/org/apache/cassandra/db/rows/AbstractRow.java |
Updated to use redacted clustering string generation |
src/java/org/apache/cassandra/db/marshal/AbstractType.java |
Added new toCQLString method with redaction parameter |
src/java/org/apache/cassandra/db/filter/RowFilter.java |
Enhanced filter expressions to support redacted CQL string generation |
src/java/org/apache/cassandra/db/filter/DataLimits.java |
Fixed limit clause formatting for proper CQL syntax |
| Multiple filter and query classes | Updated CQL string generation methods to support redaction parameter throughout the call chain |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR modifies AbstractReadQuery.toCQLString so it doesn't include column values. There is a boolean flag to opt-out from redaction, since seeing the queried values can be useful while debugging.
I would like to request to have two exposed methods with different names. And give better name to the methods, which targets logs and excludes column values.
Also can you go through the PR checklist? Some items can be visited already now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR and improving maitnainability by reducing code duplication. It's great to see such improvements.
I am requesting to separate fixes to producing correct CQL string from the implementation of redaction into a separate PR. There are many practical reasons, and splitting will help to do review and also address review comments. I have plenty comments just on fixing correctness of produced CQL string. And I also have comments to redaction implementation. I think it will be useful to have the redaction PR be based on the correctness PR.
My comments to the first part is maintly about improving code maintainability. I probably noticed few uncrutial misses too.
For the redaction part I have an issue that the decision when to restrict and when not, e.g., in impelemntations of toString, is arbitrary. It will be great to summarize where and why redaction is applied and where not in the PR discription. I also suggest to bring this summary to Slack for visibility and potential discussion.
Also I would like to see improvements to the names of methods toCQLString() (without argument) to provide semantics of redaction if they do so.
My review is currently partial as the PR is big.
src/java/org/apache/cassandra/db/filter/AbstractClusteringIndexFilter.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/filter/AbstractClusteringIndexFilter.java
Outdated
Show resolved
Hide resolved
| return String.format("ORDER BY %s %s", column.name, operator); | ||
| case ANN: | ||
| return String.format("ORDER BY %s ANN OF %s", column.name, truncateValue(type.toCQLString(value, redact))); | ||
| case LIKE_PREFIX: | ||
| return likeToCQLString("'%s%%'", type, redact); | ||
| case LIKE_SUFFIX: | ||
| return likeToCQLString("'%%%s'", type, redact); | ||
| case LIKE_CONTAINS: | ||
| return likeToCQLString("'%%%s%%'", type, redact); | ||
| case LIKE_MATCHES: | ||
| return likeToCQLString("'%s'", type, redact); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing break and return smells to me. However, I haven't got a good idea to suggest here. I thought if likeToCQLString can return formatting strings to be used in the common return, however, serializing like operators to strings can cause issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything actionable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a suggestion, thus no action to do in this PR. I am wonder what do you think about this mix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have anything to say, that's why I did it this way.
@k-rus this PR is already split into separate commits, one for each thing, which took quite a bit of work to do because generally one realizes things about correctness when working on masking, since both things are tightly related. Should we spend more time in dividing this in two PRs? Also, there is review feedback for both things in this PR. |
d7a1449 to
dfdecf0
Compare
|
@k-rus thanks for reviewing. I think I have addressed all the feedback on this partial review round. Please let me know when the rest of the review is ready. |
I’m not sure I understand why having two PRs would be more work than keeping two clean commits. My suggestion is to create the first PR focused solely on fixing the CQL string issues (as in the current commit) and then base the redaction PR on top of that one. I specifically don’t suggest basing the second PR on main, since that would introduce unnecessary extra work. Splitting the work into two PRs helps reduce the review size and avoid mixing semantically different changes. It also makes it easier to review each part separately with the existing tooling. Since code quality is a current focus, having two clear, purpose-specific PRs will definitely help with review efficiency and maintainability. |
dfdecf0 to
ae827cd
Compare
I think we have separate commits as a requirement for this kind of thing. This PR already has abundant review discussion on both correctness and refactoring that will be missing in the separate PR. Maybe that feedback should have been hold until the PR was split to help with review efficiency and maintainability. Anyway, I have created that separate PR if that helps unblock this, leaving this one for correctness: #2038 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for splitting into a separate PR as it helps a lot, since the PR is still not small with many changes.
It might be helpful in future if there is a separate ticket for fixing issues with generated CQL strings. Having a separate issue will also allow to remove connection to redaction. What do you think?
The PR looks good to me. I think I found a couple of places where toCQLstring is missing. Also I am not sure if generated CQL doesn't need to be fixed for ANN queries. (if it's so and it requires work, I don't mind to postpone into a separate issue, and this PR will just introduce tests demonstrating the current issue)
Otherwise, my comments are nits to improve maintainability of the code.
test/unit/org/apache/cassandra/db/MultiRangeReadCommandCQLTest.java
Outdated
Show resolved
Hide resolved
test/unit/org/apache/cassandra/db/MultiRangeReadCommandCQLTest.java
Outdated
Show resolved
Hide resolved
| // test literals | ||
| createTable("CREATE TABLE %s (k text, c text, m map<text, text>, PRIMARY KEY (k, c))"); | ||
| assertToCQLString("SELECT m['key'] FROM %s WHERE m['key'] = 'value' ALLOW FILTERING", | ||
| "SELECT m FROM %s WHERE m['key'] = 'value' ALLOW FILTERING"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that projection is missing the collection access, i.e., m instead of m['key']? Or is it something to fix in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what is received internally, due to the differences between fetched/queries columns, etc.
| "SELECT * FROM %s WHERE k = 0 ALLOW FILTERING"); | ||
| assertToCQLString("SELECT * FROM %s WHERE k = 0 ORDER BY c1 DESC", | ||
| "SELECT * FROM %s WHERE k = 0 ORDER BY c1 DESC, c2 DESC ALLOW FILTERING"); | ||
| assertToCQLString("SELECT * FROM %s WHERE k = 0 ORDER BY c1 DESC, c2 DESC", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a curious question, which, I believe, is outside the scope of this PR. Is it allowed to have mixing ASC and DESC for c1 and c2? E.g., ORDER BY c1 DESC, c2 ASC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not supported, due to how clusterings are built.
Various fixes for toCQLString mainly coming from CASSANDRA-16510, and removal of code duplication.
ae827cd to
c2702c7
Compare
|
❌ Build ds-cassandra-pr-gate/PR-1985 rejected by Butler1 regressions found Found 1 new test failures
Found 3 known test failures |
|
Closing as superseded by #2083 (CNDB-15760) |



The method
AbstractReadQuery.toCQLStringprints commands as CQL queries including any column values. This includes the queried values in theWHEREpart of aSELECTstatement or the written values onINSERTandUPDATEstatement. This method is used at least by the slow query logger, printing user data into the logs.This PR includes a number of fixes to make
toCQLStringproduce correct CQL syntax, mostly coming from CASSANDRA-16510. These are fixes to prevent printing wrong or redundant CQL syntax such asSELECT * FROM t ORDER BY (v ANN OF [...])orSELECT * FROM t WHERE c1=0 AND c1=0. Those issues came up while writing tests for testing redaction with various types of queries, and are needed to properly test redaction. There is also some removal of code duplication. These changes are in a separate commit in an attempt to ease review.This PR originally included both those fixes and redaction in separate commit, but they have been separated in two PRs at reviewer request. The PR for redaction is #2038. This PR however contains reviewer feedback regarding changes that have been ported to that PR, so it might be harder to follow.