-
Notifications
You must be signed in to change notification settings - Fork 531
Modernize python code for construct
#609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3791623 to
9f64086
Compare
elftools/construct/lib/binary.py
Outdated
| number >>= 1 | ||
| i -= 1 | ||
| return bytes(bits) | ||
| return bytes(bool(number & (1 << (width - 1 - bit))) for bit in range(width)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as previous PR: please don't make code changes like this to construct, for reasons your PR header explains - I do hope to use it as a dependency eventually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven yet looked at how construct evolved upstream, but after reading #548 switching back to upstream looks stalled to me. If my time permits I will take a look myself to see, if typing can be added there as well.
But: even if upstream construct has type annotations itself, there are some pyelftools specific commits, which would be good to have:
- db4fb21 adds an ugly hack to allow using
Passwithout too much hassle as dictionary values. Actually the current code is broken as it lacks special code to handlePassin many cases, respective getting an untranslated value in addition to those expected value. see elf: _default_=Pass for enums #96 - 6f99ce0 adds some _internal knowledge of
pyelftoolstoContainer, which is used as a generic Container containing anything. This is required to get some sensible type annotations for well known keys.
You might want ignore this if you only want to provide type annotations for the external API, but would loose much of the benefit of type hinting for your internal use; just see all those many small issues I fixed lately while adding type hints: Most of them have been found by mypy after adding type hints to all internal functions.
I have not yet checked if construct even spills into pyelftoolss API itself, e.g. if there are public API functions returning Container or receiving it – or any other class from construct: In that case they become part of pyelftoolss public API, which would then remain untyped.
As I already have done the type hinting for construct in this branch, I will leave that here to sit until a decision on #548 is reached or until I have found to take a look at construct upstream myself. Until then consider this branch a stash to share my work already done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all OK, but can we avoid non-trivial code changes like this specific one? It makes reviewing hard.
If you only do superficial modernization + type hints, it's much easier for me to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have dropped those code cleanups for now; only the typing related changes should remain.
716b657 to
37c5b44
Compare
Python 3 is working on `bytes`. Use raw-strings to allow backslash escaped bytes. Signed-off-by: Philipp Hahn <[email protected]>
`paddir=both` -> `center`; see [PaddedStringAdapter](elftools/construct/adapters.py#L186). Signed-off-by: Philipp Hahn <[email protected]>
Signed-off-by: Philipp Hahn <[email protected]>
Since Python 3 streams work on `bytes`, not (Unicode) `str`ings. Signed-off-by: Philipp Hahn <[email protected]>
`BitStreamReader` and `BitStreamWriter` are supposed to be used instead of `IO[bytes]`, but actually implement a different signature: that would force all functions currently accepting `IO[bytes]` to be changed to `IO[bytes] | BitStreamReader` respective `IO[bytes] | BitStreamWriter`. Instead let both inherit from `io.RawIOBase` to inherit the complete API and implement that interface by returning the correct values and types. See <https://docs.python.org/3/library/io.html#io.RawIOBase>. Signed-off-by: Philipp Hahn <[email protected]>^
Explicitly name the named parameters in the function signature instead of using `**kwargs`, which is a pain to type annotate. This also allows us to get rid of the manual check for extra unknown arguments, as the compiler will do that for us. Signed-off-by: Philipp Hahn <[email protected]>
Type checkers complain about `pos` being undefined inside the exception handling code. Actually `ConstructError` will only be raised by `parse()` inside the loop, so both `c` and `pos` will be assigned at least once, but all type-checkers fail to detect this. Initialize both `c` and `pos` before the loop to silence them. Signed-off-by: Philipp Hahn <[email protected]>
`pyright` is very picky about `Final` symbols being imported multiple times, even when they are the same. Explicitly name all symbole to be exported. Signed-off-by: Philipp Hahn <[email protected]>
- [x] `pyright` - [x] `mypy` Signed-off-by: Philipp Hahn <[email protected]>
Signed-off-by: Philipp Hahn <[email protected]>
Assert name of `PrefixedArray.length_field.name` is `not None`. Signed-off-by: Philipp Hahn <[email protected]>
`ExprAdapter` expects two *functions* as encode/decoder and overwrites its *methods*: Are those two functions called with `self` as the 1st argument? Make it clear by implementing 2 methods and calling those two functions as functions. Signed-off-by: Philipp Hahn <[email protected]>
Directly return `str` after `decode()` as `obj` is declared as `bytes`. Signed-off-by: Philipp Hahn <[email protected]>
`Peek._parse()` must `return None` explicitly for `mypy`. Signed-off-by: Philipp Hahn <[email protected]>
We know that `name` is a `str`, so tell it `mypy` too. Signed-off-by: Philipp Hahn <[email protected]>
`NoDefault`, `Pass` and `Terminator` are all singletons, which are used in various dictionaries. For type hinting the *type* is also required in addition to the singleton *instance*. Prefix the class with an underscore. Hint instances with `Final[Any]` to prevent re-assignment and to exempt them from type checking: Otherwise all code de-referencing something from `dict[str, … | _Pass]` has to explicitly check for `_Pass` all time, which currently it does not. Actually that is a bug, which needs fixing at a later day. Signed-off-by: Philipp Hahn <[email protected]>
Delete instance variables instead of assigning `None`, which then requires checks for `not None`. Signed-off-by: Philipp Hahn <[email protected]>
Code explicitly checks for `AttributeError`, but `mypy` is picky here. Signed-off-by: Philipp Hahn <[email protected]>
`Container` is a static typing nightmare as it acts as a `dict[str, Any]`. In addition to that instances are also assigned additional variables using arbitrary types. Basically `Any` goes in, `Any` comes out with is not useful for typing. As we know the types used by `pyelftools`, we can give `mypy` et.al. some hints on those use cases, but have to put theme here on `Container` as that is the base for everything. Yikes. Signed-off-by: Philipp Hahn <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, much better this way.
As a general comment, could you please add type checker running instructions somewhere (maybe in https://github.com/eliben/pyelftools/wiki/Hacking-guide?)
e.g. how exactly to install the type checker (version/etc.) and how to run it on the entire project here
| return bool(self.conflags & flag) | ||
|
|
||
| def __getstate__(self): | ||
| def __getstate__(self) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR you seem to be using both object and Any as annotations
How do you approach when to use what for this library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See The Any type. Summary:
- use
Anyonly as the last resort; basically you give up and tell the type-checkers to not check the type – because it is to complex or too painful to type in more detail. - On
objectyou can only dostr(…),repo(…),hash(…)and… == …and… != …; soobjectis okay for something simple asdef myprint(args: list[object]) -> None: print(" ".join(args)), but not much more. - A value with the
Anytype is dynamically typed. Basically we don't care about the concrete type here, butobjectis not appropriate: In line 369 there is this:
attrs["packer"] = attrs["packer"].format
objecthas no attributeformat, so usingAnyhere solves that nicely.
As an alternative to using Any here you would have to use a TypedDict where you explicitly name all ever existing state-attributes with their corresponding type. This is impractical as that list can be extended by inheriting from StaticField, so would never be complete.
| def __init__(self, name = None, show_stream = True, | ||
| show_context = True, show_stack = True, | ||
| stream_lookahead = 100): | ||
| def __init__(self, name: str | None = None, show_stream: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOC: is it actually required to annotate __init__ as returning None? Will things fail if you don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since mypy 0.640 "New Feature: Allow Omitting Return Type for init" (~2018-10) it can be omitted – as long as you have at least one argument annotated – a completely un-annotated function/method is considered dynamically types and is not checked. I have not checked others type-checkers like pytype or pyright how they handle it.
For consistency I prefer to always type the return type, even for __init__(…) -> None
Don't know how to edit/clone that Wiki, but something like this: diff --git a/Hacking-guide.rest b/Hacking-guide.rest
index 93957ad3777f..c2fc727e1ef1 100644
--- a/Hacking-guide.rest
+++ b/Hacking-guide.rest
@@ -64,4 +64,19 @@ Coding conventions
**pyelftools** is written in Python, following the `PEP 8 <http://www.python.org/dev/peps/pep-0008/>`_ style guide.
+Type checking
+-------------
+**pyelftools** is using and providing Type Hints, following the `PEP 484 <https://peps.python.org/pep-0484/>`_.
+Please make sure new functions and methods are properly type hinted.
+There are some known limitations as **pyelftools** is dynamic for some types, which does not work well for _static typing_.
+
+Please make sure to run [pyright](https://github.com/microsoft/pyright) and/or [mypy](https://mypy.readthedocs.io/en/stable/) on your contributions.
+For this Python 3.12 (or newer) is required:
+
+.. sourcecode:: text
+
+ > uv sync --dev --group typing --python 3.12
+ > . .venv/bin/activate
+ > pyright elftools/
+ > mypy |
Oh, OK. I'll move the wiki to a doc file you can send PRs to (don't particularly like the GitHub wikies anyhow). We should discuss this - I don't undestand the uv usage here, for example. Why not |
This is done now, can you please send a PR for https://github.com/eliben/pyelftools/blob/main/doc/hacking-guide.rst ? |
See #614
I have added alternatives there and an explanation: Basically |
While working on #514 I took the opportunity to modernize also some old code in #607. But there are several changed related to the internal copy of
construct. That copy shall be updated by #548, but until then the following changes might be okay:pyelftools.construct.core.Range:mypycomplained aboutposnot being declared in all cases; rewrite the logic.const, except while declaring them._printablewith a dict-comprehensionchar2binandbin2char