Skip to content

Conversation

@pmhahn
Copy link
Contributor

@pmhahn pmhahn commented Mar 15, 2025

While working on #514 I took the opportunity to modernize also some old code in #607. But there are several changed related to the internal copy of construct. That copy shall be updated by #548, but until then the following changes might be okay:

  • Fix the doctest
  • Use named arguments instead of kwargs to allow typing them
  • Simplify pyelftools.construct.core.Range: mypy complained about pos not being declared in all cases; rewrite the logic.
  • Some variables are const, except while declaring them.
    • Build _printable with a dict-comprehension
    • Same for char2bin and bin2char
  • Rewrite some functions to use generater expressions

@pmhahn pmhahn mentioned this pull request Mar 15, 2025
@pmhahn pmhahn force-pushed the construct branch 2 times, most recently from 3791623 to 9f64086 Compare March 15, 2025 05:59
number >>= 1
i -= 1
return bytes(bits)
return bytes(bool(number & (1 << (width - 1 - bit))) for bit in range(width))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as previous PR: please don't make code changes like this to construct, for reasons your PR header explains - I do hope to use it as a dependency eventually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven yet looked at how construct evolved upstream, but after reading #548 switching back to upstream looks stalled to me. If my time permits I will take a look myself to see, if typing can be added there as well.

But: even if upstream construct has type annotations itself, there are some pyelftools specific commits, which would be good to have:

  • db4fb21 adds an ugly hack to allow using Pass without too much hassle as dictionary values. Actually the current code is broken as it lacks special code to handle Pass in many cases, respective getting an untranslated value in addition to those expected value. see elf: _default_=Pass for enums #96
  • 6f99ce0 adds some _internal knowledge of pyelftools to Container, which is used as a generic Container containing anything. This is required to get some sensible type annotations for well known keys.

You might want ignore this if you only want to provide type annotations for the external API, but would loose much of the benefit of type hinting for your internal use; just see all those many small issues I fixed lately while adding type hints: Most of them have been found by mypy after adding type hints to all internal functions.

I have not yet checked if construct even spills into pyelftoolss API itself, e.g. if there are public API functions returning Container or receiving it – or any other class from construct: In that case they become part of pyelftoolss public API, which would then remain untyped.

As I already have done the type hinting for construct in this branch, I will leave that here to sit until a decision on #548 is reached or until I have found to take a look at construct upstream myself. Until then consider this branch a stash to share my work already done.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all OK, but can we avoid non-trivial code changes like this specific one? It makes reviewing hard.

If you only do superficial modernization + type hints, it's much easier for me to review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have dropped those code cleanups for now; only the typing related changes should remain.

@pmhahn pmhahn force-pushed the construct branch 2 times, most recently from 716b657 to 37c5b44 Compare March 17, 2025 19:49
pmhahn added 6 commits March 18, 2025 21:06
Python 3 is working on `bytes`.

Use raw-strings to allow backslash escaped bytes.

Signed-off-by: Philipp Hahn <[email protected]>
`paddir=both` -> `center`; see
[PaddedStringAdapter](elftools/construct/adapters.py#L186).

Signed-off-by: Philipp Hahn <[email protected]>
Since Python 3 streams work on `bytes`, not (Unicode) `str`ings.

Signed-off-by: Philipp Hahn <[email protected]>
`BitStreamReader` and `BitStreamWriter` are supposed to be used instead
of `IO[bytes]`, but actually implement a different signature: that would
force all functions currently accepting `IO[bytes]` to be changed to
`IO[bytes] | BitStreamReader` respective `IO[bytes] | BitStreamWriter`.

Instead let both inherit from `io.RawIOBase` to inherit the complete API
and implement that interface by returning the correct values and types.

See <https://docs.python.org/3/library/io.html#io.RawIOBase>.

Signed-off-by: Philipp Hahn <[email protected]>^
Explicitly name the named parameters in the function signature instead
of using `**kwargs`, which is a pain to type annotate.
This also allows us to get rid of the manual check for extra unknown
arguments, as the compiler will do that for us.

Signed-off-by: Philipp Hahn <[email protected]>
@pmhahn pmhahn mentioned this pull request Mar 19, 2025
pmhahn added 13 commits April 1, 2025 07:25
Type checkers complain about `pos` being undefined inside the exception
handling code. Actually `ConstructError` will only be raised by
`parse()` inside the loop, so both `c` and `pos` will be assigned at
least once, but all type-checkers fail to detect this.

Initialize both `c` and `pos` before the loop to silence them.

Signed-off-by: Philipp Hahn <[email protected]>
`pyright` is very picky about `Final` symbols being imported multiple
times, even when they are the same.

Explicitly name all symbole to be exported.

Signed-off-by: Philipp Hahn <[email protected]>
- [x] `pyright`
- [x] `mypy`

Signed-off-by: Philipp Hahn <[email protected]>
Assert name of `PrefixedArray.length_field.name` is `not None`.

Signed-off-by: Philipp Hahn <[email protected]>
`ExprAdapter` expects two *functions* as encode/decoder and overwrites
its *methods*: Are those two functions called with `self` as the 1st
argument?

Make it clear by implementing 2 methods and calling those two functions
as functions.

Signed-off-by: Philipp Hahn <[email protected]>
Directly return `str` after `decode()` as `obj` is declared as `bytes`.

Signed-off-by: Philipp Hahn <[email protected]>
`Peek._parse()` must `return None` explicitly for `mypy`.

Signed-off-by: Philipp Hahn <[email protected]>
We know that `name` is a `str`, so tell it `mypy` too.

Signed-off-by: Philipp Hahn <[email protected]>
`NoDefault`, `Pass` and `Terminator` are all singletons, which are used
in various dictionaries. For type hinting the *type* is also required in
addition to the singleton *instance*.

Prefix the class with an underscore.

Hint instances with `Final[Any]` to prevent re-assignment and to exempt
them from type checking: Otherwise all code de-referencing something
from `dict[str, … | _Pass]` has to explicitly check for `_Pass` all
time, which currently it does not. Actually that is a bug, which needs
fixing at a later day.

Signed-off-by: Philipp Hahn <[email protected]>
Delete instance variables instead of assigning `None`, which then
requires checks for `not None`.

Signed-off-by: Philipp Hahn <[email protected]>
Code explicitly checks for `AttributeError`, but `mypy` is picky here.

Signed-off-by: Philipp Hahn <[email protected]>
`Container` is a static typing nightmare as it acts as a `dict[str,
Any]`. In addition to that instances are also assigned additional
variables using arbitrary types. Basically `Any` goes in, `Any` comes
out with is not useful for typing.

As we know the types used by `pyelftools`, we can give `mypy` et.al.
some hints on those use cases, but have to put theme here on
`Container` as that is the base for everything.

Yikes.

Signed-off-by: Philipp Hahn <[email protected]>
Copy link
Owner

@eliben eliben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, much better this way.

As a general comment, could you please add type checker running instructions somewhere (maybe in https://github.com/eliben/pyelftools/wiki/Hacking-guide?)

e.g. how exactly to install the type checker (version/etc.) and how to run it on the entire project here

return bool(self.conflags & flag)

def __getstate__(self):
def __getstate__(self) -> dict[str, Any]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR you seem to be using both object and Any as annotations

How do you approach when to use what for this library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See The Any type. Summary:

  • use Any only as the last resort; basically you give up and tell the type-checkers to not check the type – because it is to complex or too painful to type in more detail.
  • On object you can only do str(…), repo(…), hash(…) and … == … and … != …; so object is okay for something simple as def myprint(args: list[object]) -> None: print(" ".join(args)), but not much more.
  • A value with the Any type is dynamically typed. Basically we don't care about the concrete type here, but object is not appropriate: In line 369 there is this:
    attrs["packer"] = attrs["packer"].format
    object has no attribute format, so using Any here solves that nicely.

As an alternative to using Any here you would have to use a TypedDict where you explicitly name all ever existing state-attributes with their corresponding type. This is impractical as that list can be extended by inheriting from StaticField, so would never be complete.

def __init__(self, name = None, show_stream = True,
show_context = True, show_stack = True,
stream_lookahead = 100):
def __init__(self, name: str | None = None, show_stream: bool = True,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC: is it actually required to annotate __init__ as returning None? Will things fail if you don't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since mypy 0.640 "New Feature: Allow Omitting Return Type for init" (~2018-10) it can be omitted – as long as you have at least one argument annotated – a completely un-annotated function/method is considered dynamically types and is not checked. I have not checked others type-checkers like pytype or pyright how they handle it.

For consistency I prefer to always type the return type, even for __init__(…) -> None

@pmhahn
Copy link
Contributor Author

pmhahn commented Apr 2, 2025

As a general comment, could you please add type checker running instructions somewhere (maybe in https://github.com/eliben/pyelftools/wiki/Hacking-guide?)

e.g. how exactly to install the type checker (version/etc.) and how to run it on the entire project here

Don't know how to edit/clone that Wiki, but something like this:

diff --git a/Hacking-guide.rest b/Hacking-guide.rest
index 93957ad3777f..c2fc727e1ef1 100644
--- a/Hacking-guide.rest
+++ b/Hacking-guide.rest
@@ -64,4 +64,19 @@ Coding conventions
 
 **pyelftools** is written in Python, following the `PEP 8 <http://www.python.org/dev/peps/pep-0008/>`_ style guide.
 
+Type checking
+-------------
 
+**pyelftools** is using and providing Type Hints, following the `PEP 484 <https://peps.python.org/pep-0484/>`_.
+Please make sure new functions and methods are properly type hinted.
+There are some known limitations as **pyelftools** is dynamic for some types, which does not work well for _static typing_.
+
+Please make sure to run [pyright](https://github.com/microsoft/pyright) and/or [mypy](https://mypy.readthedocs.io/en/stable/) on your contributions.
+For this Python 3.12 (or newer) is required:
+
+.. sourcecode:: text
+
+  > uv sync --dev --group typing --python 3.12
+  > . .venv/bin/activate
+  > pyright elftools/
+  > mypy

@eliben
Copy link
Owner

eliben commented Apr 2, 2025

As a general comment, could you please add type checker running instructions somewhere (maybe in https://github.com/eliben/pyelftools/wiki/Hacking-guide?)
e.g. how exactly to install the type checker (version/etc.) and how to run it on the entire project here

Don't know how to edit/clone that Wiki, but something like this:

diff --git a/Hacking-guide.rest b/Hacking-guide.rest
index 93957ad3777f..c2fc727e1ef1 100644
--- a/Hacking-guide.rest
+++ b/Hacking-guide.rest
@@ -64,4 +64,19 @@ Coding conventions
 
 **pyelftools** is written in Python, following the `PEP 8 <http://www.python.org/dev/peps/pep-0008/>`_ style guide.
 
+Type checking
+-------------
 
+**pyelftools** is using and providing Type Hints, following the `PEP 484 <https://peps.python.org/pep-0484/>`_.
+Please make sure new functions and methods are properly type hinted.
+There are some known limitations as **pyelftools** is dynamic for some types, which does not work well for _static typing_.
+
+Please make sure to run [pyright](https://github.com/microsoft/pyright) and/or [mypy](https://mypy.readthedocs.io/en/stable/) on your contributions.
+For this Python 3.12 (or newer) is required:
+
+.. sourcecode:: text
+
+  > uv sync --dev --group typing --python 3.12
+  > . .venv/bin/activate
+  > pyright elftools/
+  > mypy

Oh, OK. I'll move the wiki to a doc file you can send PRs to (don't particularly like the GitHub wikies anyhow). We should discuss this - I don't undestand the uv usage here, for example. Why not uvx mypy if you use uv already

@eliben eliben merged commit ba0da09 into eliben:main Apr 2, 2025
5 checks passed
@eliben
Copy link
Owner

eliben commented Apr 2, 2025

Oh, OK. I'll move the wiki to a doc file you can send PRs to (don't particularly like the GitHub wikies anyhow). We should discuss this - I don't undestand the uv usage here, for example. Why not uvx mypy if you use uv already

This is done now, can you please send a PR for https://github.com/eliben/pyelftools/blob/main/doc/hacking-guide.rst ?

@pmhahn pmhahn deleted the construct branch April 2, 2025 15:30
@pmhahn
Copy link
Contributor Author

pmhahn commented Apr 2, 2025

Oh, OK. I'll move the wiki to a doc file you can send PRs to (don't particularly like the GitHub wikies anyhow). We should discuss this - I don't undestand the uv usage here, for example. Why not uvx mypy if you use uv already

This is done now, can you please send a PR for https://github.com/eliben/pyelftools/blob/main/doc/hacking-guide.rst ?

See #614

We should discuss this - I don't undestand the uv usage here, for example. Why not uvx mypy if you use uv already

I have added alternatives there and an explanation: Basically mypy and pyright need the so-called typed-… companion packages in their VENVs, but pyelftools currently has no external dependencies, this is okay for now. As soon as you have them, using a globally installed mypy / pyright becomes a problem.

pmhahn added a commit to pmhahn/pyelftools that referenced this pull request Sep 8, 2025
pmhahn added a commit to pmhahn/pyelftools that referenced this pull request Sep 8, 2025
pmhahn added a commit to pmhahn/pyelftools that referenced this pull request Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants