Skip to content

Commit 393270f

Browse files
authored
Merge pull request #997 from s19110/pyDoc_CWE-595_is
pySCG: Adding explanation of the 'is' operator to CWE-595
2 parents c2c97dc + 42ed102 commit 393270f

File tree

4 files changed

+158
-17
lines changed

4 files changed

+158
-17
lines changed

docs/Secure-Coding-Guide-for-Python/CWE-697/CWE-595/README.md

Lines changed: 109 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,71 @@
11
# CWE-595: Comparison of Object References Instead of Object Contents
22

3-
In Python, the `==` operator is implemented by the `__eq__` method on an object [[python.org data model 2023](https://docs.python.org/3/reference/datamodel.html?highlight=__eq__#object.__eq__)]. For built-in types like `int` and `str`, the comparison is implemented in the interpreter. The main issue comes when implementing custom classes, where the default implementation compares object references using the `is` operator. The `is` operator compares the identities of the objects, equivalent to `id(obj1) == id(obj2)`. The `id` function is built into Python, and in the CPython interpreter, the standard implementation, it returns the object's memory address [[de Langen 2023](https://realpython.com/python-is-identity-vs-equality/)].
3+
Prevent unexpected results by knowing the differences between comparison operators such as `==` and `is`.
4+
5+
In Python, the `==` operator is implemented by the `__eq__` method on an object [[python.org data model 2023](https://docs.python.org/3/reference/datamodel.html?highlight=__eq__#object.__eq__)]. For built-in types like `int` and `str`, the comparison is implemented in the interpreter. The main issue comes when implementing custom classes, where the default implementation compares object references using the `is` operator. The `is` operator compares the identities of the objects, equivalent to `id(obj1) == id(obj2)`.
6+
In CPython, this is their memory address. Everything in Python is an object, and each object is stored at a specific memory location [[de Langen 2023](https://realpython.com/python-is-identity-vs-equality/)].
47

58
You want to implement the `__eq__` method on a class if you believe you ever want to compare it to another object or find it in a list of objects. Actually, it is so common that the `dataclasses.dataclass` decorator by default implements it for you [[dataclasses — Data Classes — Python 3.11.4 documentation](https://docs.python.org/3/library/dataclasses.html#dataclasses.dataclass)].
69

10+
Be aware of Python's memory optimization for strings and numbers as demonstrated in `example01.py` code.
11+
Python tries to avoid allocating more memory for the same string. The process of reusing already existing strings is a Python optimization technique known as **String interning** [[sys — System-specific parameters and functions — Python 3.11.4 documentation](https://docs.python.org/3/library/sys.html#sys.intern)] According to the documentation, "CPython keeps an array of integer objects for all integers between `-5` and `256`. When you create an `int` in that range you actually just get back a reference to the existing object." [[Integer objects — Python 3.11.4 documentation](https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong)]
12+
13+
_[example01.py:](example01.py)_
14+
15+
```py
16+
""" Code Example """
17+
18+
print("-" * 10 + "Memory optimization with strings" + 10 * "-")
19+
a = "foobar"
20+
b = "foobar"
21+
c = ''.join(["foo", "bar"])
22+
print(f"a is b: {a} is {b}?", a is b)
23+
print(f"a is c: {a} is {c}?", a is c)
24+
print(f"a == c: {a} == {c}?", a == c)
25+
print(f"size? len(a)={len(a)} len(b)={len(b)} len(c)={len(c)}")
26+
27+
print("-" * 10 + "Memory optimization with numbers" + 10 * "-")
28+
a = b = 256
29+
print (f"{a} is {b}?", a is b)
30+
a = b = 257
31+
print (f"{a} is {b}?", a is b)
32+
33+
print("-" * 10 + "Memory optimization with numbers in a loop" + 10 * "-")
34+
a = b = 255
35+
while(a is b):
36+
a += 1
37+
b += 1
38+
print (f"{a} is {b}?", a is b)
39+
```
40+
41+
**Output of example01.py:**
42+
43+
```bash
44+
----------Memory optimization with strings----------
45+
a is b: foobar is foobar? True
46+
a is c: foobar is foobar? False
47+
a == c: foobar == foobar? True
48+
size? len(a)=6 len(b)=6 len(c)=6
49+
----------Memory optimization with numbers----------
50+
256 is 256? True
51+
257 is 257? True
52+
----------Memory optimization with numbers in a loop----------
53+
256 is 256? True
54+
257 is 257? False
55+
```
56+
57+
The first set of print statements illustrates string interning. While `a` and `b` reuse the same object, `c` is created by joining two new strings, which results in an object with a different `id()`. The variables in the middle example both point to the same number object, which is why comparing them after `a = b = 257` still returns `True` even though `257` falls outside of the cached range. However, when assigning values in a loop, Python needs to allocate new objects for numbers greater than `256` and thus will create two separate objects as soon as it hits `257`. The way caching and interning works may differ between running a Python script from a file and using REPL, which may produce different results when running `example01.py` in Python's interactive mode.
58+
759
## Non-Compliant Code Example
860

9-
The non-compliant code shows how the default comparison operator compares object references rather than the object values. Furthermore, it displays how this causes issues when comparing lists of objects, although it applies to other types of collections as well. Finally, it shows how the `in` operator also depends on the behavior of the `__eq__` method and, therefore, also returns a non-desirable result.
61+
The `noncompliant01.py` code demonstrates potentially unexpected outcomes when using different comparisons.
1062

11-
[*noncompliant01.py:*](noncompliant01.py)
63+
* The `==` operator using `__eq__`, checks value equality for most build-in types, checks for reference equality if the `__eq__` is missing in a custom class. So `12 == 12` is `True` and `Integer(12) == Integer(12)` is `False`.
64+
* The `==` comparing lists of objects, that also applies to other types of collections.
65+
* The `in` operator also depends on the behavior of the `__eq__` method
66+
* The `is` operator that checks the references point to the same object regardless of the stored value.
67+
68+
_[noncompliant01.py:](noncompliant01.py)_
1269

1370
```py
1471
""" Non-compliant Code Example """
@@ -27,42 +84,81 @@ print(Integer(12) == Integer(12))
2784
print([Integer(12)] == [Integer(12)])
2885
# And this is equally this will always be False as well
2986
print(Integer(12) in [Integer(10), Integer(12)])
87+
# The 'is' will return True only if both references point to the same object
88+
a = Integer(12)
89+
b = a
90+
# Here, a and b point to the same Integer, so 'is' returns True
91+
print(a is b)
92+
93+
b = Integer(12)
94+
# Even though b still points to an Integer of the same value, it is a new object, so 'is' returns False
95+
print(a is b)
96+
97+
```
98+
99+
**Output of noncompliant01.py:**
30100

101+
```bash
102+
False
103+
False
104+
False
105+
True
106+
False
31107
```
32108

33109
## Compliant Solution
34110

35-
In this compliant solution the `__eq__` method is implemented and all the comparisons now correctly compares the object values, rather than the object reference.
111+
In this compliant solution, the `__eq__` method is implemented and the comparisons that not use `is` now correctly compare the object values, rather than the object reference. The `is` operator does not call `__eq__`, hence the last print will still display `False`.
36112

37-
[*compliant01.py:*](compliant01.py)
113+
_[compliant01.py:](compliant01.py)_
38114

39115
```py
40116
""" Compliant Code Example """
41-
42-
117+
118+
43119
class Integer:
44120
def __init__(self, value):
45121
self.value = value
46-
122+
47123
def __eq__(self, other):
48124
if isinstance(other, type(self)):
49125
return self.value == other.value
50126
if isinstance(other, int):
51127
return self.value == other
52128
return False
53-
54-
129+
130+
55131
#####################
56132
# exploiting above code example
57133
#####################
58134
# All these scenarios will now show True
59135
print(Integer(12) == Integer(12))
60136
print([Integer(12)] == [Integer(12)])
61137
print(Integer(12) in [Integer(10), Integer(12)])
62-
138+
63139
# By adding the handling for int we also support
64140
print(Integer(12) == 12)
141+
# The 'is' will return True only if both references point to the same object
142+
a = Integer(12)
143+
b = a
144+
# Here, a and b point to the same Integer, so 'is' returns True
145+
print(a is b)
146+
147+
b = Integer(12)
148+
# Since the 'is' operator does not call __eq__, print below will still return False
149+
print(a is b)
150+
151+
```
152+
153+
**Output of compliant01.py:**
65154

155+
```bash
156+
True
157+
True
158+
True
159+
True
160+
True
161+
False
66162
```
67163

68164
## Automated Detection
@@ -86,3 +182,5 @@ print(Integer(12) == 12)
86182
|[[python.org data model 2023](https://docs.python.org/3/reference/datamodel.html?highlight=__eq__#object.__eq__)]|[3. Data model — Python 3.11.3 documentation](https://docs.python.org/3/reference/datamodel.html?highlight=__eq__#object.__eq__)|
87183
|[[de Langen 2023](https://realpython.com/python-is-identity-vs-equality/)]|[Python '!=' Is Not 'is not': Comparing Objects in Python – Real Python](https://realpython.com/python-is-identity-vs-equality/)|
88184
|[[dataclasses — Data Classes — Python 3.11.4 documentation](https://docs.python.org/3/library/dataclasses.html#dataclasses.dataclass)]|[9. Classes — Python 3.11.3 documentation](https://docs.python.org/3/tutorial/classes.html)|
185+
|[[sys — System-specific parameters and functions — Python 3.11.4 documentation](https://docs.python.org/3/library/sys.html#sys.intern)]|[sys — System-specific parameters and functions — Python 3.11.3 documentation](https://docs.python.org/3/library/sys.html#sys.intern)|
186+
|[[Integer objects — Python 3.11.4 documentation](https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong)]|[Integer objects — Python 3.11.4 documentation](https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong)|
Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,36 @@
11
# SPDX-FileCopyrightText: OpenSSF project contributors
22
# SPDX-License-Identifier: MIT
33
""" Compliant Code Example """
4-
5-
4+
5+
66
class Integer:
77
def __init__(self, value):
88
self.value = value
9-
9+
1010
def __eq__(self, other):
1111
if isinstance(other, type(self)):
1212
return self.value == other.value
1313
if isinstance(other, int):
1414
return self.value == other
1515
return False
16-
17-
16+
17+
1818
#####################
1919
# exploiting above code example
2020
#####################
2121
# All these scenarios will now show True
2222
print(Integer(12) == Integer(12))
2323
print([Integer(12)] == [Integer(12)])
2424
print(Integer(12) in [Integer(10), Integer(12)])
25-
25+
2626
# By adding the handling for int we also support
2727
print(Integer(12) == 12)
28+
# The 'is' will return True only if both references point to the same object
29+
a = Integer(12)
30+
b = a
31+
# Here, a and b point to the same Integer, so 'is' returns True
32+
print(a is b)
33+
34+
b = Integer(12)
35+
# Since the 'is' operator does not call __eq__, print below will still return False
36+
print(a is b)
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# SPDX-FileCopyrightText: OpenSSF project contributors
2+
# SPDX-License-Identifier: MIT
3+
""" Code Example """
4+
5+
print("-" * 10 + "Memory optimization with strings" + 10 * "-")
6+
a = "foobar"
7+
b = "foobar"
8+
c = ''.join(["foo", "bar"])
9+
print(f"a is b: {a} is {b}?", a is b)
10+
print(f"a is c: {a} is {c}?", a is c)
11+
print(f"a == c: {a} == {c}?", a == c)
12+
print(f"size? len(a)={len(a)} len(b)={len(b)} len(c)={len(c)}")
13+
14+
print("-" * 10 + "Memory optimization with numbers" + 10 * "-")
15+
a = b = 256
16+
print (f"{a} is {b}?", a is b)
17+
a = b = 257
18+
print (f"{a} is {b}?", a is b)
19+
20+
print("-" * 10 + "Memory optimization with numbers in a loop" + 10 * "-")
21+
a = b = 255
22+
while(a is b):
23+
a += 1
24+
b += 1
25+
print (f"{a} is {b}?", a is b)

docs/Secure-Coding-Guide-for-Python/CWE-697/CWE-595/noncompliant01.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,12 @@ def __init__(self, value):
1616
print([Integer(12)] == [Integer(12)])
1717
# And this is equally this will always be False as well
1818
print(Integer(12) in [Integer(10), Integer(12)])
19+
# The 'is' will return True only if both references point to the same object
20+
a = Integer(12)
21+
b = a
22+
# Here, a and b point to the same Integer, so 'is' returns True
23+
print(a is b)
24+
25+
b = Integer(12)
26+
# Even though b still points to an Integer of the same value, it is a new object, so 'is' returns False
27+
print(a is b)

0 commit comments

Comments
 (0)