Skip to content

Use new OTP28.1 :re.import for regexes #14719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

sabiwara
Copy link
Contributor

This is a PoC to integrate erlang/otp#9976.

I'm not sure about introducing a separate compile_export function or always do it in compile, but given there is a small perf cost for using an exported pattern over a ref one, it might be good to distinguish?

If we agree with the approach, will add proper docs.

@josevalim
Copy link
Member

From the discussion, the correct way to implement this would be to emit an AST code that looks like this:

quote do
  :re.import(unquote(Macro.escape(:re.compile(..., [:export]))))
end

And, fwiw, we could support the export option in Regex.compile as it also has benefits across nodes.

The big question, is where we are going to emit this code, because someone could still do:

quote do
  Macro.escape(regex)
end

And, in the case above, we wouldn't be able to wrap it around import/export. So I'm thinking this is perhaps best implemented in elixir_erl_pass? We see if there is a Regex node with all fields as literals, and then transparently inline it. Although it means we are special casing regexes in the compiler (which I guess it is fine, since we are already special case them in Kernel).

@sabiwara
Copy link
Contributor Author

sabiwara commented Aug 21, 2025

From the discussion, the correct way to implement this would be to emit an AST code that looks like this:

Oh I see, that makes more sense indeed. This seems much better than importing on run. (although we might still need this for cross-node export-compiled ones)

And, fwiw, we could support the export option in Regex.compile as it also has benefits across nodes.

Gotcha.

So I'm thinking this is perhaps best implemented in elixir_erl_pass? We see if there is a Regex node with all fields as literals, and then transparently inline it. Although it means we are special casing regexes in the compiler (which I guess it is fine, since we are already special case them in Kernel).

Probably a weird idea but sharing it anyway. A way to avoid cheating could be to make Macro.escape a protocol that structs could implement. So Regex could implement it as %{re_pattern: :re.import(...)}.
Perhaps this could be useful for some fine-based libs like LazyHTML generating structs with refs?

@josevalim
Copy link
Member

Probably a weird idea but sharing it anyway. A way to avoid cheating could be to make Macro.escape a protocol that structs could implement. So Regex could implement it as %{re_pattern: :re.import(...)}.

Yes, this would be the logical extension of this approach. We can sit on it though, we don't have to introduce it now. We can validate it with regular expressions and then consider expanding it based on the use cases. :)

@sabiwara
Copy link
Contributor Author

Sth like 79164c8?

re: adding the :export option.
The reason I went with a different function is because of the opts which can be binaries or lists, making this a bit unwieldy.
Or should we accept sth like _opts = {:export, binary_opts}?

@josevalim
Copy link
Member

Sth like 79164c8?

Sorry, I am rather thinking about a completely different approach, which would be detecting the Regex in elixir_erl_pass and then directly calling the Erlang APIs to compile it and emit a special AST. We probably wouldn't need many changes on the Elixir APIs.

@josevalim
Copy link
Member

Actually, perhaps we don't need to do it on elixir_erl_pass but on elixir_expand. But the idea is that we would allow the reference (and the regex) to go all the way down to elixir_expand, where it would then be modified and expanded.

@sabiwara
Copy link
Contributor Author

Actually, perhaps we don't need to do it on elixir_erl_pass but on elixir_expand. But the idea is that we would allow the reference (and the regex) to go all the way down to elixir_expand, where it would then be modified and expanded.

Oh I see, this was the missing piece for me since I didn't know which AST should be emitted for re_pattern for the compiler to pick it up. I got it now!
Happy to keep working on this, but in case you prefer to take over and explore this, please let me know, I'd totally understand 🙂

@josevalim
Copy link
Member

Please go ahead! I have type system work on my plate at the moment :)

@sabiwara
Copy link
Contributor Author

Closing in favor of #14720

@sabiwara sabiwara closed this Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants