forked from LeenkxTeam/LNXSDK
		
	
		
			
	
	
		
			461 lines
		
	
	
		
			21 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
		
		
			
		
	
	
			461 lines
		
	
	
		
			21 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
|  | MAINTENANCE README FOR PCRE2 | ||
|  | ============================ | ||
|  | 
 | ||
|  | The files in the "maint" directory of the PCRE2 source contain data, scripts, | ||
|  | and programs that are used for the maintenance of PCRE2, but which do not form | ||
|  | part of the PCRE2 distribution tarballs. This document describes these files | ||
|  | and also contains some notes for maintainers. Its contents are: | ||
|  | 
 | ||
|  |   Files in the maint directory | ||
|  |   Updating to a new Unicode release | ||
|  |   Preparing for a PCRE2 release | ||
|  |   Making a PCRE2 release | ||
|  |   Long-term ideas (wish list) | ||
|  | 
 | ||
|  | 
 | ||
|  | Files in the maint directory | ||
|  | ============================ | ||
|  | 
 | ||
|  | GenerateCommon.py | ||
|  |   A Python module containing data and functions that are used by the other | ||
|  |   Generate scripts. | ||
|  |    | ||
|  | GenerateTest26.py | ||
|  |   A Python script that generates input and expected output test data for test | ||
|  |   26, which tests certain aspects of Unicode property support.   | ||
|  | 
 | ||
|  | GenerateUcd.py | ||
|  |   A Python script that generates the file pcre2_ucd.c from GenerateCommon.py | ||
|  |   and Unicode data files, which are themselves downloaded from the Unicode web | ||
|  |   site. The generated file contains the tables for a 2-stage lookup of Unicode | ||
|  |   properties, along with some auxiliary tables. The script starts with a long | ||
|  |   comment that gives details of the tables it constructs.  | ||
|  | 
 | ||
|  | GenerateUcpHeader.py | ||
|  |   A Python script that generates the file pcre2_ucp.h from GenerateCommon.py | ||
|  |   and Unicode data files. The generated file defines constants for various | ||
|  |   Unicode property values. | ||
|  | 
 | ||
|  | GenerateUcpTables.py | ||
|  |   A Python script that generates the file pcre2_ucptables.c from | ||
|  |   GenerateCommon.py and Unicode data files. The generated file contains tables | ||
|  |   for looking up Unicode property names. | ||
|  | 
 | ||
|  | ManyConfigTests | ||
|  |   A shell script that runs "configure, make, test" a number of times with | ||
|  |   different configuration settings. | ||
|  | 
 | ||
|  | pcre2_chartables.c.non-standard | ||
|  |   This is a set of character tables that came from a Windows system. It has | ||
|  |   characters greater than 128 that are set as spaces, amongst other things. I | ||
|  |   kept it so that it can be used for testing from time to time. | ||
|  | 
 | ||
|  | README | ||
|  |   This file. | ||
|  | 
 | ||
|  | Unicode.tables | ||
|  |   The files in this directory were downloaded from the Unicode web site. They | ||
|  |   contain information about Unicode characters and scripts, and are used by the | ||
|  |   Generate scripts. There is also UnicodeData.txt, which is no longer used by | ||
|  |   any script, because it is useful occasionally for manually looking up the | ||
|  |   details of certain characters. However, note that character names in this | ||
|  |   file such as "Arabic sign sanah" do NOT mean that the character is in a | ||
|  |   particular script (in this case, Arabic). Scripts.txt and | ||
|  |   ScriptExtensions.txt are where to look for script information. | ||
|  | 
 | ||
|  | ucptest.c | ||
|  |   A program for testing the Unicode property macros that do lookups in the | ||
|  |   pcre2_ucd.c data, mainly useful after rebuilding the Unicode property tables. | ||
|  |   Compile and run this in the "maint" directory (see comments at its head). | ||
|  |   This program can also be used to find characters with specific properties and  | ||
|  |   to list which properties are supported.  | ||
|  | 
 | ||
|  | ucptestdata | ||
|  |   A directory containing four files, testinput{1,2} and testoutput{1,2}, for | ||
|  |   use in conjunction with the ucptest program. | ||
|  | 
 | ||
|  | utf8.c | ||
|  |   A short, freestanding C program for converting a Unicode code point into a | ||
|  |   sequence of bytes in the UTF-8 encoding, and vice versa. If its argument is a | ||
|  |   hex number such as 0x1234, it outputs a list of the equivalent UTF-8 bytes. | ||
|  |   If its argument is a sequence of concatenated UTF-8 bytes (e.g. 12e188b4) it | ||
|  |   treats them as a UTF-8 string and outputs the equivalent code points in hex. | ||
|  |   See comments at its head for details. | ||
|  | 
 | ||
|  | 
 | ||
|  | Updating to a new Unicode release | ||
|  | ================================= | ||
|  | 
 | ||
|  | When there is a new release of Unicode, the files in Unicode.tables must be | ||
|  | refreshed from the web site. Once that is done, the four Python scripts that  | ||
|  | generate files from the Unicode data can be run from within the "maint"  | ||
|  | directory. | ||
|  | 
 | ||
|  | Note: Previously, it was necessary to update lists of scripts and their  | ||
|  | abbreviations by hand before running the Python scripts. This is no longer | ||
|  | necessary because the scripts have been upgraded to extract this information | ||
|  | themselves. Also, there used to be explicit lists of scripts in two of the man | ||
|  | pages. This is no longer the case; the pcre2test program can now output a list  | ||
|  | of supported scripts. | ||
|  | 
 | ||
|  | You can give an output file name as an argument to the following scripts, but | ||
|  | by default: | ||
|  | 
 | ||
|  | GenerateUcd.py        creates pcre2_ucd.c        ) | ||
|  | GenerateUcpHeader.py  creates pcre2_ucp.h        ) in the current directory | ||
|  | GenerateUcpTables.py  creates pcre2_ucptables.c  ) | ||
|  | 
 | ||
|  | These files can be compared against the existing versions in the src directory | ||
|  | to check on any changes before replacing the old files, but you can also | ||
|  | generate directly into the final location by running: | ||
|  | 
 | ||
|  | ./GenerateUcd.py       ../src/pcre2_ucd.c | ||
|  | ./GenerateUcpHeader.py ../src/pcre2_ucp.h | ||
|  | ./GenerateUcpTables.py ../src/pcre2_ucptables.c | ||
|  | 
 | ||
|  | Once the .c and .h files are in the ../src directory, the ucptest program can | ||
|  | be compiled and used to check that the new tables work properly. The data files | ||
|  | in ucptestdata are set up to check a number of test characters. See the | ||
|  | comments at the start of ucptest.c. If there are new scripts, adding a few | ||
|  | tests to the files in ucptestdata is a good idea. | ||
|  | 
 | ||
|  | Finally, you should run the GenerateTest26.py script to regenerate new versions  | ||
|  | of the input and expected output from a series of Unicode property tests that  | ||
|  | are automatically generated from the Unicode data files. By default, the files | ||
|  | are written to testinput26 and testoutput26 in the current directory, but you | ||
|  | can give an alternative directory name as an argument to the script. These | ||
|  | files should eventually be installed in the main testdata directory. | ||
|  | 
 | ||
|  | 
 | ||
|  | Preparing for a PCRE2 release | ||
|  | ============================= | ||
|  | 
 | ||
|  | This section contains a checklist of things that I do before building a new | ||
|  | release. | ||
|  | 
 | ||
|  | . Ensure that the version number and version date are correct in configure.ac. | ||
|  | 
 | ||
|  | . Update the library version numbers in configure.ac according to the rules | ||
|  |   given below. | ||
|  | 
 | ||
|  | . If new build options or new source files have been added, ensure that they | ||
|  |   are added to the CMake files as well as to the autoconf files. The relevant | ||
|  |   files are CMakeLists.txt and config-cmake.h.in. After making a release, test | ||
|  |   it out with CMake if there have been changes here. | ||
|  | 
 | ||
|  | . Run ./autogen.sh to ensure everything is up-to-date. | ||
|  | 
 | ||
|  | . Compile and test with many different config options, and combinations of | ||
|  |   options. Also, test with valgrind by running "RunTest valgrind" and | ||
|  |   "RunGrepTest valgrind". The script maint/ManyConfigTests now encapsulates | ||
|  |   this testing. It runs tests with different configurations, and it also runs | ||
|  |   some of them with valgrind, all of which can take quite some time. | ||
|  | 
 | ||
|  | . Run tests in both 32-bit and 64-bit environments if possible. I can no longer | ||
|  |   run 32-bit tests. | ||
|  | 
 | ||
|  | . Run tests with two or more different compilers (e.g. clang and gcc), and | ||
|  |   make use of -fsanitize=address and friends where possible. For gcc, | ||
|  |   -fsanitize=undefined -std=gnu99 picks up undefined behaviour at runtime, but | ||
|  |   needs -fno-sanitize=shift to get rid of warnings for shifts of negative | ||
|  |   numbers in the JIT compiler. For clang, -fsanitize=address,undefined,integer | ||
|  |   can be used but -fno-sanitize=alignment,shift,unsigned-integer-overflow must | ||
|  |   be added when compiling with JIT. Another useful clang option is | ||
|  |   -fsanitize=signed-integer-overflow | ||
|  | 
 | ||
|  | . Do a test build using CMake. Remove src/config.h first, lest it override the | ||
|  |   version that CMake creates. Also do a CMake unity build to check that it  | ||
|  |   still works: [c]cmake -DCMAKE_UNITY_BUILD=ON sets up a unity build. | ||
|  | 
 | ||
|  | . Run perltest.sh on the test data for tests 1 and 4. The output should match | ||
|  |   the PCRE2 test output, apart from the version identification at the start of | ||
|  |   each test. Sometimes there are other differences in test 4 if PCRE2 and Perl | ||
|  |   are using different Unicode releases. The other tests are not Perl-compatible | ||
|  |   (they use various PCRE2-specific features or options). | ||
|  | 
 | ||
|  | . It is possible to test with the emulated memmove() function by undefining | ||
|  |   HAVE_MEMMOVE and HAVE_BCOPY in config.h, though I do not do this often. | ||
|  | 
 | ||
|  | . Documentation: check AUTHORS, ChangeLog (check version and date), LICENCE, | ||
|  |   NEWS (check version and date), NON-AUTOTOOLS-BUILD, and README. Many of these | ||
|  |   won't need changing, but over the long term things do change. | ||
|  | 
 | ||
|  | . I used to test new releases myself on a number of different operating | ||
|  |   systems. For example, on Solaris it is helpful to test using Sun's cc | ||
|  |   compiler as a change from gcc. Adding -xarch=v9 to the cc options does a | ||
|  |   64-bit test, but it also needs -S 64 for pcre2test to increase the stack size | ||
|  |   for test 2. Since I retired I can no longer do much of this. There are  | ||
|  |   automated tests under Ubuntu, Alpine, and Windows that are now set up as  | ||
|  |   GitHub actions. Check that they are running clean. | ||
|  | 
 | ||
|  | . The buildbots at http://buildfarm.opencsw.org/ do some automated testing | ||
|  |   of PCRE2 and should also be checked before putting out a release. | ||
|  | 
 | ||
|  | 
 | ||
|  | Updating version info for libtool | ||
|  | ================================= | ||
|  | 
 | ||
|  | This set of rules for updating library version information came from a web page | ||
|  | whose URL I have forgotten. The version information consists of three parts: | ||
|  | (current, revision, age). | ||
|  | 
 | ||
|  | 1. Start with version information of 0:0:0 for each libtool library. | ||
|  | 
 | ||
|  | 2. Update the version information only immediately before a public release of | ||
|  |    your software. More frequent updates are unnecessary, and only guarantee | ||
|  |    that the current interface number gets larger faster. | ||
|  | 
 | ||
|  | 3. If the library source code has changed at all since the last update, then | ||
|  |    increment revision; c:r:a becomes c:r+1:a. | ||
|  | 
 | ||
|  | 4. If any interfaces have been added, removed, or changed since the last | ||
|  |    update, increment current, and set revision to 0. | ||
|  | 
 | ||
|  | 5. If any interfaces have been added since the last public release, then | ||
|  |    increment age. | ||
|  | 
 | ||
|  | 6. If any interfaces have been removed or changed since the last public | ||
|  |    release, then set age to 0. | ||
|  | 
 | ||
|  | The following explanation may help in understanding the above rules a bit | ||
|  | better. Consider that there are three possible kinds of reaction from users to | ||
|  | changes in a shared library: | ||
|  | 
 | ||
|  | 1. Programs using the previous version may use the new version as a drop-in | ||
|  |    replacement, and programs using the new version can also work with the | ||
|  |    previous one. In other words, no recompiling nor relinking is needed. In | ||
|  |    this case, increment revision only, don't touch current or age. | ||
|  | 
 | ||
|  | 2. Programs using the previous version may use the new version as a drop-in | ||
|  |    replacement, but programs using the new version may use APIs not present in | ||
|  |    the previous one. In other words, a program linking against the new version | ||
|  |    may fail if linked against the old version at run time. In this case, set | ||
|  |    revision to 0, increment current and age. | ||
|  | 
 | ||
|  | 3. Programs may need to be changed, recompiled, relinked in order to use the | ||
|  |    new version. Increment current, set revision and age to 0. | ||
|  | 
 | ||
|  | 
 | ||
|  | Making a PCRE2 release | ||
|  | ====================== | ||
|  | 
 | ||
|  | Run PrepareRelease and commit the files that it changes. The first thing this | ||
|  | script does is to run CheckMan on the man pages; if it finds any markup errors, | ||
|  | it reports them and then aborts. Otherwise it removes trailing spaces from | ||
|  | sources and refreshes the HTML documentation. Update the GitHub repository with | ||
|  | "git push". | ||
|  | 
 | ||
|  | Once PrepareRelease has run clean, run "make distcheck" to create the tarballs | ||
|  | and the zipball. I then sign these files. Double-check with "git status" that | ||
|  | the repository is fully up-to-date, then create a new tag and a release on | ||
|  | GitHub. Upload the tarballs, zipball, and the signatures as "assets" of the | ||
|  | GitHub release. | ||
|  | 
 | ||
|  | When the new release is out, don't forget to tell webmaster@pcre.org and the | ||
|  | mailing list. | ||
|  | 
 | ||
|  | 
 | ||
|  | Future ideas (wish list) | ||
|  | ======================== | ||
|  | 
 | ||
|  | This section records a list of ideas so that they do not get forgotten. They | ||
|  | vary enormously in their usefulness and potential for implementation. Some are | ||
|  | very sensible; some are rather wacky. Some have been on this list for many | ||
|  | years. | ||
|  | 
 | ||
|  | . Optimization | ||
|  | 
 | ||
|  |   There are always ideas for new optimizations so as to speed up pattern | ||
|  |   matching. Most of them try to save work by recognizing a non-match without | ||
|  |   having to scan all the possibilities. These are some that I've recorded: | ||
|  | 
 | ||
|  |   * /((A{0,5}){0,5}){0,5}(something complex)/ on a non-matching string is very | ||
|  |     slow, though Perl is fast. Can we speed up somehow? Convert to {0,125}? | ||
|  |     OTOH, this is pathological - the user could easily fix it. | ||
|  | 
 | ||
|  |   * Turn ={4} into ==== ? (for speed). I once did an experiment, and it seems | ||
|  |     to have little effect, and maybe makes things worse. | ||
|  | 
 | ||
|  |   * "Ends with literal string" - note that a single character doesn't gain much | ||
|  |     over the existing "required code unit" feature that just remembers one code | ||
|  |     unit. | ||
|  | 
 | ||
|  |   * Remember an initial string rather than just 1 code unit. | ||
|  | 
 | ||
|  |   * A required code unit from alternatives - not just the last unit, but an | ||
|  |     earlier one if common to all alternatives. | ||
|  | 
 | ||
|  |   * Friedl contains other ideas. | ||
|  | 
 | ||
|  |   * The code does not set initial code unit flags for Unicode property types | ||
|  |     such as \p; I don't know how much benefit there would be for, for example, | ||
|  |     setting the bits for 0-9 and all values >= xC0 (in 8-bit mode) when a | ||
|  |     pattern starts with \p{N}. | ||
|  | 
 | ||
|  | . If Perl gets to a consistent state over the settings of capturing sub- | ||
|  |   patterns inside repeats, see if we can match it. One example of the | ||
|  |   difference is the matching of /(main(O)?)+/ against mainOmain, where PCRE2 | ||
|  |   leaves $2 set. In Perl, it's unset. Changing this in PCRE2 will be very hard | ||
|  |   because I think it needs much more state to be remembered. | ||
|  | 
 | ||
|  | . A feature to suspend a match via a callout was once requested. | ||
|  | 
 | ||
|  | . An option to convert results into character offsets and character lengths. | ||
|  | 
 | ||
|  | . A (non-Unix) user wanted pcregrep options to (a) list a file name just once, | ||
|  |   preceded by a blank line, instead of adding it to every matched line, and (b) | ||
|  |   support --outputfile=name. | ||
|  | 
 | ||
|  | . Define a union for the results from pcre2_pattern_info(). | ||
|  | 
 | ||
|  | . Provide a "random access to the subject" facility so that the way in which it | ||
|  |   is stored is independent of PCRE2. For efficiency, it probably isn't possible | ||
|  |   to switch this dynamically. It would have to be specified when PCRE2 was | ||
|  |   compiled. PCRE2 would then call a function every time it wanted a character. | ||
|  | 
 | ||
|  | . pcre2grep: add -rs for a sorted recurse. Having to store file names and sort | ||
|  |   them will of course slow it down. | ||
|  | 
 | ||
|  | . Someone suggested --disable-callout to save code space when callouts are | ||
|  |   never wanted. This seems rather marginal. | ||
|  | 
 | ||
|  | . A user suggested a parameter to limit the length of string matched, for | ||
|  |   example if the parameter is N, the current match should fail if the matched | ||
|  |   substring exceeds N. This could apply to both match functions. The value | ||
|  |   could be a new field in the match context. Compare the offset_limit feature, | ||
|  |   which limits where a match must start. | ||
|  | 
 | ||
|  | . Write a function that generates random matching strings for a compiled | ||
|  |   pattern. | ||
|  | 
 | ||
|  | . Pcre2grep: an option to specify the output line separator, either as a string | ||
|  |   or select from a fixed list. This is not straightforward, because at the | ||
|  |   moment it outputs whatever is in the input file. | ||
|  | 
 | ||
|  | . Improve the code for duplicate checking in pcre2_dfa_match(). An incomplete, | ||
|  |   non-thread-safe patch showed that this can help performance for patterns | ||
|  |   where there are many alternatives. However, a simple thread-safe | ||
|  |   implementation that I tried made things worse in many simple cases, so this | ||
|  |   is not an obviously good thing. | ||
|  | 
 | ||
|  | . PCRE2 cannot at present distinguish between subpatterns with different names, | ||
|  |   but the same number (created by the use of ?|). In order to do so, a way of | ||
|  |   remembering *which* subpattern numbered n matched is needed. (*MARK) can | ||
|  |   perhaps be used as a way round this problem. However, note that Perl does not | ||
|  |   distinguish: like PCRE2, a name is just an alias for a number in Perl. | ||
|  | 
 | ||
|  | . Instead of having #ifdef HAVE_CONFIG_H in each module, put #include | ||
|  |   "something" and the the #ifdef appears only in one place, in "something". | ||
|  | 
 | ||
|  | . Implement something like (?(R2+)... to check outer recursions. | ||
|  | 
 | ||
|  | . If Perl ever supports the POSIX notation [[.something.]] PCRE2 should try | ||
|  |   to follow. | ||
|  | 
 | ||
|  | . A user wanted a way of ignoring all Unicode "mark" characters so that, for | ||
|  |   example "a" followed by an accent would, together, match "a". This can only | ||
|  |   be done clumsily at present by using a lookahead such as /(?=a)\X/, which | ||
|  |   works for "combining" characters. | ||
|  | 
 | ||
|  | . Perl supports [\N{x}-\N{y}] as a Unicode range, even in EBCDIC. PCRE2 | ||
|  |   supports \N{U+dd..} everywhere, but not in EBCDIC. | ||
|  | 
 | ||
|  | . Unicode stuff from Perl: | ||
|  | 
 | ||
|  |     \b{gcb} or \b{g}    grapheme cluster boundary | ||
|  |     \b{sb}              sentence boundary | ||
|  |     \b{wb}              word boundary | ||
|  | 
 | ||
|  |   See Unicode TR 29. The last two are very much aimed at natural language. | ||
|  | 
 | ||
|  | . Allow a callout to specify a number of characters to skip. This can be done | ||
|  |   compatibly via an extra callout field. | ||
|  | 
 | ||
|  | . Allow callouts to return *PRUNE, *COMMIT, *THEN, *SKIP, with and without | ||
|  |   continuing (that is, with and without an implied *FAIL). A new option, | ||
|  |   PCRE2_CALLOUT_EXTENDED say, would be needed. This is unlikely ever to be | ||
|  |   implemented by JIT, so this could be an option for pcre2_match(). | ||
|  | 
 | ||
|  | . A limit on substitutions: a user suggested somehow finding a way of making | ||
|  |   match_limit apply to the whole operation instead of each match separately. | ||
|  | 
 | ||
|  | . Some #defines could be replaced with enums to improve robustness. | ||
|  | 
 | ||
|  | . There was a request for an option for pcre2_match() to return the longest | ||
|  |   match. This would mean searching for all possible matches, of course. | ||
|  | 
 | ||
|  | . Perl's /a modifier sets Unicode, but restricts \d etc to ASCII characters, | ||
|  |   which is the PCRE2 default for PCRE2_UTF (use PCRE2_UCP to change). However, | ||
|  |   Perl also has /aa, which in addition, disables ASCII/non-ASCII caseless | ||
|  |   matching. Perhaps we need a new option PCRE2_CASELESS_RESTRICT_ASCII. In | ||
|  |   practice, this just means not using the ucd_caseless_sets[] table. | ||
|  | 
 | ||
|  | . There is more that could be done to the oss-fuzz setup (needs some research). | ||
|  |   A seed corpus could be built. I noted something about $LIB_FUZZING_ENGINE. | ||
|  |   The test function could make use of get_substrings() to cover more code. | ||
|  | 
 | ||
|  | . A neater way of handling recursion file names in pcre2grep, e.g. a single | ||
|  |   buffer that can grow. See also GitHub issue #2 (recursion looping via | ||
|  |   symlinks). | ||
|  | 
 | ||
|  | . A user suggested that before/after parameters in pcre2grep could have | ||
|  |   negative values, to list lines near to the matched line, but not necessarily | ||
|  |   the line itself. For example, --before-context=-1 would list the line *after* | ||
|  |   each matched line, without showing the matched line. The problem here is what | ||
|  |   to do with matches that are close together. Maybe a simpler way would be a | ||
|  |   flag to disable showing matched lines, only valid with either -A or -B? | ||
|  | 
 | ||
|  | . There was a suggestiong for a pcre2grep colour default, or possibly a more | ||
|  |   general PCRE2GREP_OPT, but only for some options - not file names or patterns. | ||
|  | 
 | ||
|  | . Breaking loops that match an empty string: perhaps find a way of continuing | ||
|  |   if *something* has changed, but this might mean remembering additional data. | ||
|  |   "Something" could be a capture value, but then a list of previous values | ||
|  |   would be needed to avoid a cycle of changes. | ||
|  | 
 | ||
|  | . If a function could be written to find 3-character (or other length) fixed | ||
|  |   strings, at least one of which must be present for a match, efficient | ||
|  |   pre-searching of large datasets could be implemented. | ||
|  | 
 | ||
|  | . If pcre2grep had --first-line (match only in the first line) it could be | ||
|  |   efficiently used to find files "starting with xxx". What about --last-line? | ||
|  |   There was also the suggestion of an option for pcre2grep to scan only the | ||
|  |   start of a file. I am not keen - this is the job of "head". | ||
|  | 
 | ||
|  | . A user requested a means of determining whether a failed match was failed by | ||
|  |   the start-of-match optimizations, or by running the match engine. Easy enough | ||
|  |   to define a bit in the match data, but all three matchers would need work. | ||
|  | 
 | ||
|  | . Would inlining "simple" recursions provide a useful performance boost for the | ||
|  |   interpreters? JIT already does some of this, but it may not be worth it for | ||
|  |   the interpreters. | ||
|  | 
 | ||
|  | . Redesign handling of class/nclass/xclass because the compile code logic is | ||
|  |   currently very contorted and obscure. Also there was a request for a way of | ||
|  |   re-defining \w (and therefore \W, \b, and \B). An in-pattern sequence such as | ||
|  |   (?w=[...]) was suggested. Easiest way would be simply to inline the class, | ||
|  |   with lookarounds for \b and \B. Ideally the setting should last till the end | ||
|  |   of the group, which means remembering all previous settings; maybe a fixed | ||
|  |   amount of stack would do - how deep would anyone want to nest these things? | ||
|  |   See GitHub issue #13 for a compendium of character class issues, including | ||
|  |   (?[...]) extended classes. | ||
|  | 
 | ||
|  | . A user suggested something like --with-build-info to set a build information | ||
|  |   string that could be retrieved by pcre2_config(). However, there's no | ||
|  |   facility for a length limit in pcre2_config(), and what would be the | ||
|  |   encoding? | ||
|  | 
 | ||
|  | . Quantified groups with a fixed count currently operate by replicating the | ||
|  |   group in the compiled bytecode. This may not really matter in these days of | ||
|  |   gigabyte memory, but perhaps another implementation might be considered. | ||
|  |   Needs coordination between the interpreters and JIT. | ||
|  | 
 | ||
|  | . There are regular requests for variable-length lookbehinds. | ||
|  | 
 | ||
|  | . See also any suggestions in the GitHub issues. | ||
|  | 
 | ||
|  | Philip Hazel | ||
|  | Email local part: Philip.Hazel | ||
|  | Email domain: gmail.com | ||
|  | Last updated: 25 April 2022 |