Skip to content

Commit 4f1ca21

Browse files
authored
Merge pull request #19875 from tamasvajk/quality/spec_chars
Java: Add query to detect special characters in string literals
2 parents f940cb2 + ccbf705 commit 4f1ca21

File tree

8 files changed

+357
-0
lines changed

8 files changed

+357
-0
lines changed

java/ql/integration-tests/java/query-suite/java-code-quality-extended.qls.expected

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ ql/java/ql/src/Violations of Best Practice/Naming Conventions/ConfusingOverloadi
7878
ql/java/ql/src/Violations of Best Practice/Naming Conventions/LocalShadowsFieldConfusing.ql
7979
ql/java/ql/src/Violations of Best Practice/Naming Conventions/SameNameAsSuper.ql
8080
ql/java/ql/src/Violations of Best Practice/Records/IgnoredSerializationMembersOfRecordClass.ql
81+
ql/java/ql/src/Violations of Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.ql
8182
ql/java/ql/src/Violations of Best Practice/Undesirable Calls/CallsToStringToString.ql
8283
ql/java/ql/src/Violations of Best Practice/Undesirable Calls/DefaultToString.ql
8384
ql/java/ql/src/Violations of Best Practice/Undesirable Calls/DoNotCallFinalize.ql

java/ql/integration-tests/java/query-suite/java-code-quality.qls.expected

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ ql/java/ql/src/Violations of Best Practice/Naming Conventions/ConfusingOverloadi
7676
ql/java/ql/src/Violations of Best Practice/Naming Conventions/LocalShadowsFieldConfusing.ql
7777
ql/java/ql/src/Violations of Best Practice/Naming Conventions/SameNameAsSuper.ql
7878
ql/java/ql/src/Violations of Best Practice/Records/IgnoredSerializationMembersOfRecordClass.ql
79+
ql/java/ql/src/Violations of Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.ql
7980
ql/java/ql/src/Violations of Best Practice/Undesirable Calls/CallsToStringToString.ql
8081
ql/java/ql/src/Violations of Best Practice/Undesirable Calls/DefaultToString.ql
8182
ql/java/ql/src/Violations of Best Practice/Undesirable Calls/DoNotCallFinalize.ql
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
## Overview
2+
3+
This query detects non-explicit control and whitespace characters in Java literals.
4+
Such characters are often introduced accidentally and can be invisible or hard to recognize, leading to bugs when the actual contents of the string contain control characters.
5+
6+
## Recommendation
7+
8+
To avoid issues, use the encoded versions of control characters (e.g. ASCII `\n`, `\t`, or Unicode `U+000D`, `U+0009`).
9+
This makes the literals (e.g. string literals) more readable, and also helps to make the surrounding code less error-prone and more maintainable.
10+
11+
## Example
12+
13+
The following examples illustrate good and bad code:
14+
15+
Bad:
16+
17+
```java
18+
char tabulationChar = ' '; // Non compliant
19+
String tabulationCharInsideString = "A B"; // Non compliant
20+
String fooZeroWidthSpacebar = "foo​bar"; // Non compliant
21+
```
22+
23+
Good:
24+
25+
```java
26+
char escapedTabulationChar = '\t';
27+
String escapedTabulationCharInsideString = "A\tB"; // Compliant
28+
String fooUnicodeSpacebar = "foo\u0020bar"; // Compliant
29+
String foo2Spacebar = "foo bar"; // Compliant
30+
String foo3Spacebar = "foo bar"; // Compliant
31+
```
32+
33+
## Implementation notes
34+
35+
This query detects Java literals that contain reserved control characters and/or non-printable whitespace characters, such as:
36+
37+
- Decimal and hexidecimal representations of ASCII control characters (code points 0-8, 11, 14-31, and 127).
38+
- Invisible characters (e.g. zero-width space, zero-width joiner).
39+
- Unicode C0 control codes, plus the delete character (U+007F), such as:
40+
41+
| Escaped Unicode | ASCII Decimal | Description |
42+
| --------------- | ------------- | ------------------------- |
43+
| `\u0000` | 0 | null character |
44+
| `\u0001` | 1 | start of heading |
45+
| `\u0002` | 2 | start of text |
46+
| `\u0003` | 3 | end of text |
47+
| `\u0004` | 4 | end of transmission |
48+
| `\u0005` | 5 | enquiry |
49+
| `\u0006` | 6 | acknowledge |
50+
| `\u0007` | 7 | bell |
51+
| `\u0008` | 8 | backspace |
52+
| `\u000B` | 11 | vertical tab |
53+
| `\u000E` | 14 | shift out |
54+
| `\u000F` | 15 | shift in |
55+
| `\u0010` | 16 | data link escape |
56+
| `\u0011` | 17 | device control 1 |
57+
| `\u0012` | 18 | device control 2 |
58+
| `\u0013` | 19 | device control 3 |
59+
| `\u0014` | 20 | device control 4 |
60+
| `\u0015` | 21 | negative acknowledge |
61+
| `\u0016` | 22 | synchronous idle |
62+
| `\u0017` | 23 | end of transmission block |
63+
| `\u0018` | 24 | cancel |
64+
| `\u0019` | 25 | end of medium |
65+
| `\u001A` | 26 | substitute |
66+
| `\u001B` | 27 | escape |
67+
| `\u001C` | 28 | file separator |
68+
| `\u001D` | 29 | group separator |
69+
| `\u001E` | 30 | record separator |
70+
| `\u001F` | 31 | unit separator |
71+
| `\u007F` | 127 | delete |
72+
73+
- Zero-width Unicode characters (e.g. zero-width space, zero-width joiner), such as:
74+
75+
| Escaped Unicode | Description |
76+
| --------------- | ------------------------- |
77+
| `\u200B` | zero-width space |
78+
| `\u200C` | zero-width non-joiner |
79+
| `\u200D` | zero-width joiner |
80+
| `\u2028` | line separator |
81+
| `\u2029` | paragraph separator |
82+
| `\u2060` | word joiner |
83+
| `\uFEFF` | zero-width no-break space |
84+
85+
The following list outlines the _**explicit exclusions from query scope**_:
86+
87+
- any number of simple space characters (`U+0020`, ASCII 32).
88+
- an escape character sequence (e.g. `\t`), or the Unicode equivalent (e.g. `\u0009`), for printable whitespace characters:
89+
90+
| Character Sequence | Escaped Unicode | ASCII Decimal | Description |
91+
| ------------------ | --------------- | ------------- | --------------- |
92+
| `\t` | \u0009 | 9 | horizontal tab |
93+
| `\n` | \u000A | 10 | line feed |
94+
| `\f` | \u000C | 12 | form feed |
95+
| `\r` | \u000D | 13 | carriage return |
96+
| | \u0020 | 32 | space |
97+
98+
- character literals (i.e. single quotes) containing control characters.
99+
- literals defined within "likely" test methods, such as:
100+
- JUnit test methods
101+
- methods annotated with `@Test`
102+
- methods of a class annotated with `@Test`
103+
- methods with names containing "test"
104+
105+
## References
106+
107+
- Unicode: [Unicode Control Characters](https://www.unicode.org/charts/PDF/U0000.pdf).
108+
- Wikipedia: [Unicode C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes).
109+
- Wikipedia: [Unicode characters with property "WSpace=yes" or "White_Space=yes"](https://en.wikipedia.org/wiki/Unicode_character_property#Whitespace).
110+
- Java API Specification: [Java String Literals](https://docs.oracle.com/javase/tutorial/java/data/characters.html).
111+
- Java API Specification: [Java Class Charset](https://docs.oracle.com/javase/8/docs/api///?java/nio/charset/Charset.html).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
/**
2+
* @id java/non-explicit-control-and-whitespace-chars-in-literals
3+
* @name Non-explicit control and whitespace characters
4+
* @description Non-explicit control and whitespace characters in literals make code more difficult
5+
* to read and may lead to incorrect program behavior.
6+
* @kind problem
7+
* @precision very-high
8+
* @problem.severity warning
9+
* @tags quality
10+
* correctness
11+
* maintainability
12+
* readability
13+
*/
14+
15+
import java
16+
17+
/**
18+
* A `Literal` that has a Unicode control character within its
19+
* literal value (as returned by `getLiteral()` member predicate).
20+
*/
21+
class ReservedUnicodeInLiteral extends Literal {
22+
private int indexStart;
23+
24+
ReservedUnicodeInLiteral() {
25+
not this instanceof CharacterLiteral and
26+
exists(int codePoint |
27+
this.getLiteral().codePointAt(indexStart) = codePoint and
28+
(
29+
// Unicode C0 control characters
30+
codePoint < 32 and not codePoint in [9, 10, 12, 13]
31+
or
32+
codePoint = 127 // delete control character
33+
or
34+
codePoint = 8203 // zero-width space
35+
)
36+
)
37+
}
38+
39+
/** Gets the starting index of the Unicode control sequence. */
40+
int getIndexStart() { result = indexStart }
41+
}
42+
43+
from ReservedUnicodeInLiteral literal, int charIndex, int codePoint
44+
where
45+
literal.getIndexStart() = charIndex and
46+
literal.getLiteral().codePointAt(charIndex) = codePoint and
47+
not literal.getEnclosingCallable() instanceof LikelyTestMethod and
48+
not literal.getFile().isKotlinSourceFile()
49+
select literal,
50+
"Literal value contains control or non-printable whitespace character(s) starting with Unicode code point "
51+
+ codePoint + " at index " + charIndex + "."
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
import java.util.List;
2+
import java.util.ArrayList;
3+
4+
public class CharTest {
5+
6+
public static void main(String[] args) {
7+
CharTest charTest = new CharTest();
8+
NonCompliantStringLiterals nonCompliant = charTest.new NonCompliantStringLiterals();
9+
CompliantStringLiterals compliant = charTest.new CompliantStringLiterals();
10+
CompliantCharLiterals compliantChar = charTest.new CompliantCharLiterals();
11+
12+
List<String> nonCompliantStrings = nonCompliant.getNonCompliantStrings();
13+
List<String> compliantStrings = compliant.getCompliantStrings();
14+
List<Character> compliantChars = compliantChar.getCompliantChars();
15+
16+
System.out.println("");
17+
System.out.println("Non-compliant strings:");
18+
for (String s : nonCompliantStrings) {
19+
System.out.println(s);
20+
System.out.println("");
21+
}
22+
23+
System.out.println("");
24+
System.out.println("Compliant strings:");
25+
for (String s : compliantStrings) {
26+
System.out.println(s);
27+
System.out.println("");
28+
}
29+
System.out.println("");
30+
31+
System.out.println("");
32+
System.out.println("Compliant character literals:");
33+
System.out.println("");
34+
for (Character c : compliantChars) {
35+
System.out.println("\\u" + String.format("%04X", (int) c));
36+
}
37+
System.out.println("");
38+
}
39+
40+
class CompliantCharLiterals {
41+
private List<Character> compliantChars;
42+
43+
public CompliantCharLiterals() {
44+
compliantChars = new ArrayList<>();
45+
compliantChars.add('A'); // COMPLIANT
46+
compliantChars.add('a'); // COMPLIANT
47+
compliantChars.add('\b'); // COMPLIANT
48+
compliantChars.add('\t'); // COMPLIANT
49+
compliantChars.add('\n'); // COMPLIANT
50+
compliantChars.add('\f'); // COMPLIANT
51+
compliantChars.add('\r'); // COMPLIANT
52+
compliantChars.add('\u0000'); // COMPLIANT
53+
compliantChars.add('\u0007'); // COMPLIANT
54+
compliantChars.add('\u001B'); // COMPLIANT
55+
compliantChars.add(' '); // COMPLIANT
56+
compliantChars.add('\u0020'); // COMPLIANT
57+
compliantChars.add('\u200B'); // COMPLIANT
58+
compliantChars.add('\u200C'); // COMPLIANT
59+
compliantChars.add('\u200D'); // COMPLIANT
60+
compliantChars.add('\u2028'); // COMPLIANT
61+
compliantChars.add('\u2029'); // COMPLIANT
62+
compliantChars.add('\u2060'); // COMPLIANT
63+
compliantChars.add('\uFEFF'); // COMPLIANT
64+
}
65+
66+
public List<Character> getCompliantChars() {
67+
return compliantChars;
68+
}
69+
}
70+
71+
class CompliantStringLiterals {
72+
private List<String> compliantStrings;
73+
74+
public CompliantStringLiterals() {
75+
compliantStrings = new ArrayList<>();
76+
compliantStrings.add(""); // COMPLIANT
77+
compliantStrings.add("X__Y"); // COMPLIANT
78+
compliantStrings.add("X_ _Y"); // COMPLIANT
79+
compliantStrings.add("X_\u0020_Y"); // COMPLIANT
80+
compliantStrings.add("X_ _Y"); // COMPLIANT
81+
compliantStrings.add("X_\u0020\u0020_Y"); // COMPLIANT
82+
compliantStrings.add("X_ _Y"); // COMPLIANT
83+
compliantStrings.add("X_ _Y"); // COMPLIANT
84+
compliantStrings.add("X_ _Y"); // COMPLIANT
85+
compliantStrings.add("X_ _Y"); // COMPLIANT
86+
compliantStrings.add("X_\b_Y"); // COMPLIANT
87+
compliantStrings.add("X_\u0000_Y"); // COMPLIANT
88+
compliantStrings.add("X_\u0001_Y"); // COMPLIANT
89+
compliantStrings.add("X_\u0002_Y"); // COMPLIANT
90+
compliantStrings.add("X_\u0003_Y"); // COMPLIANT
91+
compliantStrings.add("X_\u0004_Y"); // COMPLIANT
92+
compliantStrings.add("X_\u0005_Y"); // COMPLIANT
93+
compliantStrings.add("X_\u0006_Y"); // COMPLIANT
94+
compliantStrings.add("X_\u0007_Y"); // COMPLIANT
95+
compliantStrings.add("X_\u0008_Y"); // COMPLIANT
96+
compliantStrings.add("X_\u0009_Y"); // COMPLIANT
97+
compliantStrings.add("X_\u0010_Y"); // COMPLIANT
98+
compliantStrings.add("X_\u0011_Y"); // COMPLIANT
99+
compliantStrings.add("X_\u0012_Y"); // COMPLIANT
100+
compliantStrings.add("X_\u0013_Y"); // COMPLIANT
101+
compliantStrings.add("X_\u0014_Y"); // COMPLIANT
102+
compliantStrings.add("X_\u0015_Y"); // COMPLIANT
103+
compliantStrings.add("X_\u0016_Y"); // COMPLIANT
104+
compliantStrings.add("X_\u0017_Y"); // COMPLIANT
105+
compliantStrings.add("X_\u0018_Y"); // COMPLIANT
106+
compliantStrings.add("X_\u0019_Y"); // COMPLIANT
107+
compliantStrings.add("X_\u001A_Y"); // COMPLIANT
108+
compliantStrings.add("X_\u001B_Y"); // COMPLIANT
109+
compliantStrings.add("X_\u001C_Y"); // COMPLIANT
110+
compliantStrings.add("X_\u001D_Y"); // COMPLIANT
111+
compliantStrings.add("X_\u001E_Y"); // COMPLIANT
112+
compliantStrings.add("X_\u001F_Y"); // COMPLIANT
113+
compliantStrings.add("X_\u007F_Y"); // COMPLIANT
114+
compliantStrings.add("X_\u200B_Y"); // COMPLIANT
115+
compliantStrings.add("X_\u200C_Y"); // COMPLIANT
116+
compliantStrings.add("X_\u200D_Y"); // COMPLIANT
117+
compliantStrings.add("X_\u2028_Y"); // COMPLIANT
118+
compliantStrings.add("X_\u2029_Y"); // COMPLIANT
119+
compliantStrings.add("X_\u2060_Y"); // COMPLIANT
120+
compliantStrings.add("X_\uFEFF_Y"); // COMPLIANT
121+
compliantStrings.add("X_\uFEFF_Y_\u0020_Z"); // COMPLIANT
122+
compliantStrings.add("X_\uFEFF_Y_\uFEFF_Z"); // COMPLIANT
123+
compliantStrings.add("X_\u0020_Y_\uFEFF_Z"); // COMPLIANT
124+
compliantStrings.add("X_\t_Y"); // COMPLIANT
125+
compliantStrings.add("X_\t\t_Y"); // COMPLIANT
126+
compliantStrings.add("X_\\b_Y"); // COMPLIANT
127+
compliantStrings.add("X_\f_Y"); // COMPLIANT
128+
compliantStrings.add("X_\\f_Y"); // COMPLIANT
129+
compliantStrings.add("X_\n_Y"); // COMPLIANT
130+
compliantStrings.add("X_\n\t_Y"); // COMPLIANT
131+
compliantStrings.add("X_\\n_Y"); // COMPLIANT
132+
compliantStrings.add("X_\r_Y"); // COMPLIANT
133+
compliantStrings.add("X_\\r_Y"); // COMPLIANT
134+
compliantStrings.add("X_\t_Y"); // COMPLIANT
135+
compliantStrings.add("X_\\t_Y"); // COMPLIANT
136+
compliantStrings.add("X_\\u0000_Y"); // COMPLIANT
137+
compliantStrings.add("X_\\u0007_Y"); // COMPLIANT
138+
compliantStrings.add("X_\\u001B_Y"); // COMPLIANT
139+
compliantStrings.add("X_\\u200B_Y"); // COMPLIANT
140+
compliantStrings.add("X_\\u200C_Y"); // COMPLIANT
141+
compliantStrings.add("X_\\u200D_Y"); // COMPLIANT
142+
compliantStrings.add("X_\\u2028_Y"); // COMPLIANT
143+
compliantStrings.add("X_\\u2029_Y"); // COMPLIANT
144+
compliantStrings.add("X_\\u2060_Y"); // COMPLIANT
145+
compliantStrings.add("X_\\uFEFF_Y"); // COMPLIANT
146+
compliantStrings.add("lorem ipsum dolor "+"sit amet"); // COMPLIANT
147+
compliantStrings.add("lorem ipsum dolor " + "sit amet"); // COMPLIANT
148+
compliantStrings.add("lorem ipsum dolor sit amet, consectetur adipiscing elit, " + // COMPLIANT
149+
"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.");
150+
compliantStrings.add("lorem ipsum dolor sit amet, consectetur adipiscing elit, " + // COMPLIANT
151+
"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad "
152+
+ "minim veniam, quis nostrud exercitation ullamco "+"laboris nisi ut aliquip ex " +
153+
"ea commodo consequat.");
154+
compliantStrings.add("""
155+
lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
156+
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
157+
"""); // COMPLIANT
158+
}
159+
160+
public List<String> getCompliantStrings() {
161+
return compliantStrings;
162+
}
163+
}
164+
165+
class NonCompliantStringLiterals {
166+
private List<String> nonCompliantStrings;
167+
168+
public NonCompliantStringLiterals() {
169+
nonCompliantStrings = new ArrayList<>();
170+
nonCompliantStrings.add("X_​_Y"); // NON_COMPLIANT
171+
nonCompliantStrings.add("X_​_Y_​_Z"); // NON_COMPLIANT
172+
nonCompliantStrings.add("lorem​ipsum dolor sit amet,​consectetur adipiscing elit, " + // NON_COMPLIANT
173+
"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.");
174+
nonCompliantStrings.add("""
175+
lorem​ipsum dolor sit amet,​consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
176+
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
177+
"""); // NON_COMPLIANT
178+
}
179+
180+
public List<String> getNonCompliantStrings() {
181+
return nonCompliantStrings;
182+
}
183+
}
184+
}

0 commit comments

Comments
 (0)