Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 15 additions & 24 deletions reference/pcre/book.xml
Original file line number Diff line number Diff line change
Expand Up @@ -9,41 +9,32 @@
<preface xml:id="intro.pcre">
&reftitle.intro;
<para>
The syntax for patterns used in these functions closely resembles
Perl. The expression must be enclosed in the delimiters, a
forward slash (/), for example. Delimiters can be any
non-alphanumeric, non-whitespace ASCII character except the backslash (\) and the
null byte. If the delimiter character has to be used in the
expression itself, it needs to be escaped by backslash.
Perl-style (), {}, [], and &lt;&gt; matching delimiters may also be used.
This extension integrates regular expression pattern matching support into
PHP. It is based on the free and open-source
<link xlink:href="&url.pcre2.website;">PCRE2 library</link>.
This library implements regular expression pattern matching
using syntax and semantics compatible with Perl,
with <link xlink:href="&url.pcre2.perlcompat;">just a few differences</link>.
See <link linkend="reference.pcre.pattern.syntax">Pattern Syntax</link>
for detailed explanation.
for a detailed usage explanation.
</para>
<para>
The ending delimiter may be followed by various modifiers that
affect the matching.
See <link linkend="reference.pcre.pattern.modifiers">Pattern
Modifiers</link>.
To improve performance, the extension caches compiled regular expressions.
Each thread has its own dedicated cache, capable of holding up to 4096
expressions.
</para>
<note>
<para>
This extension maintains a global per-thread cache of compiled regular
expressions (up to 4096).
<para>When the cache reaches capacity, it automatically removes the oldest
entry to make room for new oneones, following a "First-In, First-Out" (FIFO)
policy. The cache size is not configurable.
</para>
</note>
<warning>
<para>
You should be aware of some limitations of PCRE. Read <link
xlink:href="&url.pcre.man;">&url.pcre.man;</link> for more info.
There are <link xlink:href="&url.pcre2.limits;">some size and other limitations
in PCRE2</link> that can occasionally be relevant.
</para>
</warning>
<!-- FIXME: Check what Perl version implementation corresponds -->
<para>
The PCRE library is a set of functions that implement regular
expression pattern matching using the same syntax and semantics
as Perl 5, with just a few differences (see below). The current
implementation corresponds to Perl 5.005.
</para>
</preface>

&reference.pcre.setup;
Expand Down
45 changes: 30 additions & 15 deletions reference/pcre/configure.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,27 @@
<section xml:id="pcre.installation" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
&reftitle.install;
<para>
The PCRE extension is a core PHP extension, so it is always enabled.
By default, this extension is compiled using the bundled PCRE
library. Alternatively, an external PCRE library can be used by
passing in the <option role="configure">--with-pcre-regex=DIR</option>
configuration option where <literal>DIR</literal> is the location of
PCRE's include and library files. It is recommended to use PCRE 8.10 or newer;
as of PHP 7.3.0, PCRE2 is required.
The PCRE extension is a core PHP extension and is always enabled.
</para>
<para>
PCRE's just-in-time compilation is supported by default, which
can be disabled with the <option role="configure">--without-pcre-jit</option>
configuration option as of PHP 7.0.12.
The extension uses a bundled version (by default) of the PCRE2 library.
An external PCRE2 library can be used instead by using the
<option role="configure">--with-external-pcre</option> configuration
option. The minimum version supported is 10.30.
</para>
<para>
PCRE's just-in-time (JIT) compilation is enabled by default.
It can be disabled by using the <option role="configure">--without-pcre-jit</option>
configuration option.
</para>
&windows.builtin;
<para>
PCRE is an active project and as it changes so does the PHP
PCRE2 is an active project and as it changes so does the PHP
functionality that relies upon it. It is possible that certain parts
of the PHP documentation is outdated, in that it may not cover the
newest features that PCRE provides. For a list of changes, see the
<link xlink:href="&url.pcre.changelog;">PCRE library changelog</link>
and also the following bundled PCRE history:
of the PHP documentation is outdated. For a list of changes, see the
<link xlink:href="&url.pcre2.changelog;">PCRE2 library changelog</link>.
Also, if using the bundled library, refer to the following bundled PCRE library
history:
</para>
<para>
<table>
Expand All @@ -37,6 +37,21 @@
</row>
</thead>
<tbody>
<row>
<entry>8.5.0 (upcoming)</entry>
<entry>10.46</entry>
<entry></entry>
</row>
<row>
<entry>8.4.0</entry>
<entry>10.44</entry>
<entry></entry>
</row>
<row>
<entry>8.3.0</entry>
<entry>10.42</entry>
<entry></entry>
</row>
<row>
<entry>8.2.0</entry>
<entry>10.40</entry>
Expand Down
132 changes: 5 additions & 127 deletions reference/pcre/pattern.differences.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,133 +5,11 @@
<title>Perl Differences</title>
<titleabbrev>Differences From Perl</titleabbrev>
<para>
The differences described here are with respect to Perl 5.005.
<orderedlist>
<listitem>
<simpara>
By default, a whitespace character is any character that
the C library function isspace() recognizes, though it is
possible to compile PCRE with alternative character type
tables. Normally isspace() matches space, formfeed, newline,
carriage return, horizontal tab, and vertical tab. Perl 5 no
longer includes vertical tab in its set of whitespace characters.
The \v escape that was in the Perl documentation for
a long time was never in fact recognized. However, the character
itself was treated as whitespace at least up to 5.002.
In 5.004 and 5.005 it does not match \s.
</simpara>
</listitem>
<listitem>
<simpara>
PCRE does not allow repeat quantifiers on lookahead
assertions. Perl permits them, but they do not mean what you
might think. For example, (?!a){3} does not assert that the
next three characters are not "a". It just asserts that the
next character is not "a" three times.
</simpara>
</listitem>
<listitem>
<simpara>
Capturing subpatterns that occur inside negative
lookahead assertions are counted, but their entries in the
offsets vector are never set. Perl sets its numerical
variables from any such patterns that are matched before the
assertion fails to match something (thereby succeeding), but
only if the negative lookahead assertion contains just one
branch.
</simpara>
</listitem>
<listitem>
<simpara>
Though binary zero characters are supported in the subject string,
they are not allowed in a pattern string because it is passed as a
normal C string, terminated by zero. The escape sequence "\x00" can
be used in the pattern to represent a binary zero.
</simpara>
</listitem>
<listitem>
<simpara>
The following Perl escape sequences are not supported:
\l, \u, \L, \U. In fact these are implemented by
Perl's general string-handling and are not part of its
pattern matching engine.
</simpara>
</listitem>
<listitem>
<simpara>
The Perl \G assertion is not supported as it is not
relevant to single pattern matches.
</simpara>
</listitem>
<listitem>
<simpara>
Fairly obviously, PCRE does not support the (?{code}) and (??{code})
construction. However, there is support for recursive patterns.
</simpara>
</listitem>
<listitem>
<simpara>
There are at the time of writing some oddities in Perl
5.005_02 concerned with the settings of captured strings
when part of a pattern is repeated. For example, matching
"aba" against the pattern /^(a(b)?)+$/ sets $2 to the value
"b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2
unset. However, if the pattern is changed to
/^(aa(b(b))?)+$/ then $2 (and $3) get set.
In Perl 5.004 $2 is set in both cases, and that is also &true;
of PCRE. If in the future Perl changes to a consistent state
that is different, PCRE may change to follow.
</simpara>
</listitem>
<listitem>
<simpara>
Another as yet unresolved discrepancy is that in Perl
5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string
"a", whereas in PCRE it does not. However, in both Perl and
PCRE /^(a)?a/ matched against "a" leaves $1 unset.
</simpara>
</listitem>
<listitem>
<para>
PCRE provides some extensions to the Perl regular
expression facilities:
<orderedlist>
<listitem>
<simpara>
Although lookbehind assertions must match fixed length
strings, each alternative branch of a lookbehind assertion
can match a different length of string. Perl 5.005 requires
them all to have the same length.
</simpara>
</listitem>
<listitem>
<simpara>
If <link linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>
is set and <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is
not set, the $ meta-character matches only at the very end of the
string.
</simpara>
</listitem>
<listitem>
<simpara>
If <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link> is
set, a backslash followed by a letter with no special meaning is
faulted.
</simpara>
</listitem>
<listitem>
<simpara>
If <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> is
set, the greediness of the repetition quantifiers is inverted,
that is, by default they are not greedy, but if followed by a
question mark they are.
</simpara>
</listitem>
</orderedlist>
</para>
</listitem>
</orderedlist>
</para>
Both Perl and PCRE2 are continually changing. Refer to PCRE2's
latest documentation covering the
<link xlink:href="&url.pcre2.perlcompat;">differences between PCRE2
and Perl</link>. The version of the PCRE2 library in-use is also
a relevant factor.
</article>

<!-- Keep this comment at the end of the file
Expand Down
Loading