Skip to content

Conversation

@khwilliamson
Copy link
Contributor

This creates a regular expression pattern of names that we feel free to expose to XS code's namespace. Hence they are names reserved for our use, and should any conflicts arise, the module needs to change, not us.

Naturally, the pattern is pretty restrictive.

Any symbol beginning with "PL_"
Any symbol containing perl, Perl, or PERL, usually delimitted on
    both sides so as to keep it from being part of a larger word.

Any other spelling that we expose could be considered to pollute the XS code space. We have felt free to do that all the time. Any new function's short name will do that.

And we generally feel free to create macros with arbitrary names which could conflict with an existing XS name.

Some important potential conflicts are:

New keywords: We create an exposed KEY_foo macro. Some existing modules use some of these. My grep of CPAN shows maybe a dozen of these get used; mostly KEY_END.

config.h is full of symbols like HAS_foo, I_bar, and others that are all exposed. I don't imagine we can claim to reserve any symbol beginning with either HAS_ or I_. And I don't know what to do here.

Informally, myself and others have used a trailing underscore to indicate a private symbol. There are a few distributions that use some of these anyway. And there has been pushback when new short symbols that use this convention have been added.

I would like to get a formal rule about use of this convention. There are 200+ of these currently. We could reserve any names with trailing underscores, or if that is too much, any ending in, say, 'pl' or 'PL'.

We have 3000+ undocumented macro names that don't end in underscores and which are currently visible to XS code. This number includes the KEY_foo ones, but not the ones in config.h.

To deal with namespace pollution, we have had the -DNO_SHORT_NAMES Configure option for use just with embedded perls. This hasn't worked at least since we added inline functions, and it always applied to only functions. I have a WIP to get this to work again, and to extend it to work with documented macros. It just occurred to me how to make this be customizable, so that downstream someone could add a list of symbols that should only exist as 'Perl_foo', and then recompile, leaving short names for everything not in the list.

  • This set of changes requires a perldelta entry, and I need help writing it.

This creates a regular expression pattern of names that we feel free to
expose to XS code's namespace.  Hence they are names reserved for our use,
and should any conflicts arise, the module needs to change, not us.

Naturally, the pattern is pretty restrictive.

    Any symbol beginning with "PL_"
    Any symbol containing perl, Perl, or PERL, usually delimitted on
        both sides so as to keep it from being part of a larger word.

Any other spelling that we expose could be considered to pollute the XS
code space.  We have felt free to do that all the time.  Any new
function's short name will do that.

And we generally feel free to create macros with arbitrary names which
could conflict with an existing XS name.

Some important potential conflicts are:

New keywords:  We create an exposed KEY_foo macro.  Some existing
modules use some of these.  My grep of CPAN shows maybe a dozen of these
get used; mostly KEY_END.

config.h is full of symbols like HAS_foo, I_bar, and others that are all
exposed.  I don't imagine we can claim to reserve any symbol beginning
with either HAS_ or I_.  And I don't know what to do here.

Informally, myself and others have used a trailing underscore to
indicate a private symbol.  There are a few distributions that use some
of these anyway.  And there has been pushback when new short symbols
that use this convention have been added.

I would like to get a formal rule about use of this convention.  There
are 200+ of these currently.  We could reserve any names with trailing
underscores, or if that is too much, any ending in, say, '_pl_' or
'_PL_'.

We have 3000+ undocumented macro names that don't end in underscores and
which are currently visible to XS code.  This number includes the
KEY_foo ones, but not the ones in config.h.

To deal with namespace pollution, we have had the -DNO_SHORT_NAMES
Configure option for use just with embedded perls.  This hasn't worked
at least since we added inline functions, and it always applied to only
functions.  I have a WIP to get this to work again, and to extend it to
work with documented macros.  It just occurred to me how to make this be
customizable, so that downstream someone could add a list of symbols
that should only exist as 'Perl_foo', and then recompile, leaving short
names for everything not in the list.
@tonycoz
Copy link
Contributor

tonycoz commented Dec 16, 2025

I can't see us restricting new name use to PL_, Perl_, PERL_ prefixes.

Do you expect new SV/CV/HV etc flags to use a PERL_SVf_... prefix? Similarly for the flag test macros?

Similarly for flags like AMGf_*, OP code macros.

I don't think we can reserve names with trailing underscores, those are in use in too many other code bases.

You know I think the perl codebase is badly polluting, especially at the macro level, but I don't think it's fixable unless we decide on some sort of limited API like the Python one (even that uses some non-Py-prefix names)

I think in the general case it's just too big a change in practice.

@khwilliamson
Copy link
Contributor Author

It is not my intent to restrict new names. The purpose is to say that we consider all names that match this pattern, as finally determined, to be fair game for us to use without any consideration on their effect in anyone's namespace. I think we should give consideration of that effect with other names, but the decision is likely to be to go ahead and use any reasonable ones. It might be that we consider anything beginning with [ACGHS][Vv] to be fair game that needs no consideration either, along with related setters and getters.

It's overly clumsy and hard to read to have names that have required prefixes. But we are now in a position where a Configure call could specify not to use names x,y,z but instead we generate Perl_x, Perl_y, and Perl_z for just those. And this could be expanded to a per module basis so the module says I use x,y,z for my purposes, don't define them for perl's. This latter would require a much bigger embed.h, but it is automatically generated.

I'm liking the idea of reserving symbols that end in pl_ for our use. I see none in cpan now.

@leonerd
Copy link
Contributor

leonerd commented Dec 18, 2025

I don't think I at all understand the impact of this change. I haven't really looked at the detail of regen/embed.pl before, nor encountered the existing pattern or the giant list of exceptional names. Can you explain a bit what the end result will be, of changing this pattern? How will a newly-built perl now differ from previous?

@khwilliamson
Copy link
Contributor Author

regen/embed.pl looks at the source and creates various files from it, including embed.h and proto.h. It is run via make regen. embed.h is used to #define short-name equivalents for functions listed in embed.fnc and a few other back-compat macros. However, it was recently expanded to also #undef macros that aren't supposed to be visible to XS code, and the default is to not be visible. This makes macros roughly equivalent to functions, where it takes explicit action in embed.fnc to make a function have visibility attribute non-hidden.

But there is an exception list of macros that are to be externally visible even if there is nothing to indicate that they should be. That list was initialized to everything that is currently so visible. That means that this change has null effect. This list is the giant one @leonerd mentioned.

The goal here is to impose future discipline on us. Newly created macros will have to have an explicit visibility specification in order to be seen by the outside world. That is easily accomplished by documenting the macro, which is something that should have been done all along, but there were no immediate negative consequences of not doing so.

Except for the deficiencies in my code that generates the giant list, none of those thousands of macros on it are formally documented. That means they are effectively namespace pollutants. They are symbols that the XS code is stuck with having, and there was no notice given of their existence. That is only a real problem if there are collisions with names the XS code is somewhat likely to use. We should rename any such or remove them from external visibility.

  • The size of the list makes it impossible to grok. The list should be whittled down over time. There are several ways an individual item can be removed from the list.
  • Document it. This has several advantages besides the obvious one. It allows embed.pl and downstream code to do automatic code generation for it.
  • Determine that this really should be internal only. Simply removing the item from the list will cause it to be #undef'd so that it has no external visibility.
  • Change its name internally to one we claim is reserved for our use.

And that is where the pattern comes in. I believe we should have a statement in some pod to the effect that we reserve for perl's use any symbol that matches this pattern. C, C++, and POSIX all have such statements. But they in turn say they won't create symbols that don't match it. We can't do that. So what we can say is that if your code has symbols that match the pattern, we won't change to accommodate you. We'll consider requests to change new symbols we have created that don't match the pattern but clash with yours.

So adding the pattern has no effect on its own, but is a basis for changes to documentation.

Now that the recent changes to embed.pl are in core, it would not be hard for us to allow a module to have an import list of core symbols, or an import-all-but list. The macros excluded by such lists would be accessible only via long names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants