Skip to content

Conversation

cdce8p
Copy link
Member

@cdce8p cdce8p commented Sep 8, 2025

No description provided.

@cdce8p cdce8p added Maintenance Discussion or action around maintaining pylint or the dev workflow Skip news 🔇 This change does not require a changelog entry labels Sep 8, 2025
Comment on lines 260 to 263
return isinstance(node, nodes.Const) and isinstance(node.value, int)
match node:
case nodes.Const(value=int()):
return True
return False
Copy link
Member Author

@cdce8p cdce8p Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure about these. I do find the match-case variant much more readable but the indentation and dedicated return True and return False are worse.

Maybe Python also needs a match expression which could be used in return statements. (Similarly to if expressions). Something like

return (node match nodes.Const(value=int()))

Copy link

codecov bot commented Sep 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.89%. Comparing base (dad4124) to head (cfec7c9).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #10544      +/-   ##
==========================================
- Coverage   95.90%   95.89%   -0.01%     
==========================================
  Files         177      177              
  Lines       19376    19410      +34     
==========================================
+ Hits        18582    18613      +31     
- Misses        794      797       +3     
Files with missing lines Coverage Δ
pylint/checkers/base/comparison_checker.py 98.59% <100.00%> (-1.41%) ⬇️
pylint/checkers/classes/class_checker.py 93.85% <100.00%> (+0.01%) ⬆️
pylint/checkers/logging.py 94.77% <100.00%> (+0.10%) ⬆️
pylint/checkers/modified_iterating_checker.py 97.87% <100.00%> (+0.14%) ⬆️
pylint/checkers/refactoring/refactoring_checker.py 98.19% <100.00%> (+0.01%) ⬆️
pylint/checkers/typecheck.py 96.18% <100.00%> (+0.01%) ⬆️
pylint/checkers/utils.py 95.96% <100.00%> (-0.03%) ⬇️
pylint/extensions/code_style.py 100.00% <100.00%> (ø)
pylint/extensions/typing.py 97.84% <100.00%> (+0.03%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This comment has been minimized.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the one liner return should be reverted, I would keep only the one where the number of line with match is smaller or equivalent than before.

return node.expr.name in {"numpy", "nmp", "np"}
match node:
case nodes.Attribute(attrname="NaN", expr=nodes.Name(name=name)):
return name in {"numpy", "nmp", "np"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated but I never saw nmp in the wild. Completely died off in favor of np ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Haven't seen it either. Let's remove it.
Will do it with the other match followups.

Comment on lines 1995 to 1997
return isinstance(func, nodes.FunctionDef) and (
func.type == "classmethod" or func.name == "__class_getitem__"
)
match func:
case nodes.FunctionDef(type="classmethod") | nodes.FunctionDef(
name="__class_getitem__"
):
return True
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf probably worse (two checks for isinstance(FunctionDef)?), take twice as much space, feel forced, me no like.

Comment on lines -829 to +836
if isinstance(node_a, nodes.Name) and isinstance(node_b, nodes.Name):
return node_a.name == node_b.name # type: ignore[no-any-return]
if isinstance(node_a, nodes.AssignName) and isinstance(
node_b, nodes.AssignName
):
return node_a.name == node_b.name # type: ignore[no-any-return]
if isinstance(node_a, nodes.Const) and isinstance(node_b, nodes.Const):
return node_a.value == node_b.value # type: ignore[no-any-return]
match (node_a, node_b):
case (
[nodes.Name(name=a), nodes.Name(name=b)]
| [nodes.AssignName(name=a), nodes.AssignName(name=b)]
| [nodes.Const(value=a), nodes.Const(value=b)]
):
return a == b # type: ignore[no-any-return]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@cdce8p
Copy link
Member Author

cdce8p commented Sep 13, 2025

I think all the one liner return should be reverted, I would keep only the one where the number of line with match is smaller or equivalent than before.

Reverted most of these. Let me know if I've missed one.

Perf probably worse

Performance is indeed an interesting topic. Took some time to look into it some more. Although the match statement is pretty close for "normal" if-else statements, especially the class matcher seems to be a bit worse than isinstance checks. Unfortunately, that's the most useful one for us. I'm not entirely sure what the reason for it is but I'd guess one part is that the match statement does some additional checks which we can usually omit with isinstance. E.g. match does need to check if an object has an attribute (hasattr) whereas with if we know the attribute is always present and just check its value.

Overall I've seen a ~3% performance impact (7:20 -> 7:35 for Home Assistant) from the recent match statement changes. That's not nothing but I think the readability improvements outweigh it.

Copy link
Contributor

🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉

This comment was generated for commit cfec7c9

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you everything make sense now.

Regarding performance, I was thinking of the particular case where we remove a isinstance(func, nodes.FunctionDef) and (x or y) by match func: case nodes.FunctionDef(x) | nodes.FunctionDef(y): but I'm sad to learn that pylint is 3% slower because of the match changes.

@cdce8p cdce8p merged commit edad7ba into pylint-dev:main Sep 14, 2025
44 checks passed
@cdce8p cdce8p deleted the match-explore-5 branch September 14, 2025 16:35
@cdce8p
Copy link
Member Author

cdce8p commented Sep 14, 2025

Regarding performance, I was thinking of the particular case where we remove a isinstance(func, nodes.FunctionDef) and (x or y) by match func: case nodes.FunctionDef(x) | nodes.FunctionDef(y): but I'm sad to learn that pylint is 3% slower because of the match changes.

I have to admit that the match statement is a bit of a black box which combines several checks in a meta language. So it's not directly obvious what's happing here. E.g. I thought the isinstance checks inside match would be cached (they aren't).

I spend some more time looking at it but this time even going as far as checking the eval loop in CPython directly. What I found are a couple of interesting facts

  • Most of the patters are actually quite optimized and even name binding isn't bad, so they are often on par or only ever so slightly slower than comparable if checks.
  • The exceptions being the MatchMapping and MatchClass patterns. That latter one we use quite frequently.
  • The source code for the MatchClass opcode is here: https://github.com/python/cpython/blob/v3.14.0rc2/Python/ceval.c#L714-L825
    A couple of notes
    • For any class pattern the first call is to isinstance. If that doesn't succeed, the check is aborted.
    • If there are any positional only subpatterns, CPython needs to lookup __match_args__. This can be avoided if only keywords are used. Since we haven't added __match_args__ to astroid yet, this is mostly the default. E.g. nodes.Name(name=name). A notable exception might be the special builtin classes with int(value). Those also try to lookup __match_args__ first before falling back to checking a particular feature flag. So it might be slightly faster to use int() as value instead.
    • One of the most expensive parts is actually the error checking. For every MatchClass pattern CPython creates an in-memory set to track the names of seen attribute names so it can raise an error if one name is a duplicate, e.g. nodes.Name(name="a", name="b"). That set alone, is responsible for a good chunk of the speed difference. Tbh I'm not even sure it's really necessary for keyword only class patterns. The worst that would happen is that it would never match. I kind of understand it with the mix of positional and keyword patterns but that might be one area which could be optimized further.
    • Another one, the set is even created even if there aren't any subpatterns at all, e.g. nodes.Name(). This should just skip the whole code path which could bring it almost to the level of a pure isinstance check. Will open an issue on the CPython repo for it next week.
    • Mentioned earlier the idea of caching the isinstance results. Tested it as well. Just creating a dict and adding one entry is enough overhead to perform 1-2 additional isinstance checks. Since this would be added to every match statement, I don't think it's worth it. The isinstance calls are surprisingly fast.

What does that mean for pylint? Yes, the match statement isn't the fastest especially the class pattern. Small speedups should be archivable if

  • We prefer keyword over positional patterns and if possible avoid the former altogether. E.g. str() as value instead of str(value).
  • In case we're just interested in the name binding, it might be ever so slightly faster to use attribute access instead. E.g. nodes.Name() as node with node.name instead of nodes.Name(name=name). Though this often reads worse so I wouldn't enforce it personally.

Lastly, we should probably add a checker to make the CPython check for duplicate attributes in a class pattern redundant.

@cdce8p
Copy link
Member Author

cdce8p commented Sep 15, 2025

Opened an issue for CPython with more details: python/cpython#138912

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Maintenance Discussion or action around maintaining pylint or the dev workflow Skip news 🔇 This change does not require a changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants