Perl 5.8.[5-7] Regular Expression Bug

Written by Geoff Mottram (geoff at minaret dot biz).

Placed in the public domain on September 14, 2004 by the author.

Last updated: September 17, 2005.

This document is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the author be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with this document or the use or other dealings in this document.

Contents
Introduction
Summary
Details

Introduction
This document describes a bug when using the "-d" (debug) option of Perl versions 5.8.5 through 5.8.7 where a regular expression will either cause the program to crash or to emit one of the following error messages:

Out of memory during "large" request for...
Out of memory during ridiculously large request

Summary
If you are receiving a segmentation fault or an out of memory condition when executing a Perl regular expression while debugging a Perl script with the Perl debug ("-d") option, you should either apply the following fix or revert back to version 5.8.4 of Perl. The problem is caused when "()" operators are used in a regular expression. In my case, the regular expression was the following:

if ($a =~ m/^([A-Za-z]+)(.*)/) 

The second time this expression is executed, the previous values of $1 and $2 are replaced with new values. When the replacement occurs, the old values are freed. If the old values are objects (which seems to happen with UTF-8 values), their DESTROY method is called. When running under the Perl debugger (-d option), the "sub" method of the "DB.pm" module is called which includes a regular expression. This regular expression causes the state of original regular expression to get clobbered.

The fix is applied to Perl source file "sv.c". It requires Perl to save its regular expression context on its stack prior to the call to the DESTROY method of the regular expression object being freed. A patch has been provided at the end of this document. It is a simple fix that can be applied by hand. Insert the following line of code to function Perl_sv_clear in file "sv.c" at line 5105 (in Perl 5.8.7, your line number may vary for previous versions):

save_re_context();

The following snippet shows the addition:

                if (destructor) {
                    SV* tmpref = newRV(sv);
                    SvREADONLY_on(tmpref);   /* DESTROY() could be naughty */
                    ENTER;
                    save_re_context();
                    PUSHSTACKi(PERLSI_DESTROY);
                    EXTEND(SP, 2);
                    PUSHMARK(SP);
                    PUSHs(tmpref);
                    PUTBACK;
                    call_sv((SV*)destructor, G_DISCARD|G_EVAL|G_KEEPERR|G_VOID);


                    POPSTACK;
                    SPAGAIN;
                    LEAVE;
                    if(SvREFCNT(tmpref) < 2) {
                        /* tmpref is not kept alive! */
                        SvREFCNT(sv)--;
                        SvRV(tmpref) = 0;
                        SvROK_off(tmpref);
                    }
                    SvREFCNT_dec(tmpref);
                }

The call to "save_re_context()" is placed between the "ENTER;" and "PUSHSTACKi(PERLSI_DESTROY);" lines.

Once you have added this line, recompile Perl and you are all set. If you are using mod_perl, you will need to re-make that program after you have installed the fixed version of Perl.

Details
The following information was submitted to perlbug@perl.org and was assigned an ID of [perl #37102]:

-------- Original Message --------
Subject: Perl regexp state gets clobbered (includes a fix)
Date: Wed, 07 Sep 2005 20:54:29 -0400
From: Geoff Mottram 
To: perlbug at perl.org

The following problem has been observed under versions of Perl after 5.8.4
on Linux and Windows including the most recent stable version (5.8.7). The
script that experienced the problem is too complex to include and requires
a connection to a database server. A patch with a fix is provided at the
end of this email.

The following regex was causing either an Out of Memory condition or a
segmentation fault after two iterations:

            if ($a =~ m/^([A-Za-z]+)(.*)/)

It was caused by the PL_regeol variable of regexec.c getting overwritten
when in the "av_store()" call on line 4370 of regexec.c (see gdb backtrace
below):

            if (a)
                sw = *a;
            else if (si && doinit) {
                sw = swash_init("utf8", "", si, 1, 0);
                (void)av_store(av, 1, sw);
            }

When av_store is called, the previous positional parameter is
de-referenced and its DESTROY method is called. I could not figure out
what object was being destroyed but the gdb backtrace will probably ring a
few bells for you. The script was working with UTF-8 content and wide
characters, which I suspect was related to the problem. In any case, the
DESTROY method that was called included a regular expression that would
trash the state of the regular expression engine a few methods up the stack.

My solution is given in a patch at the very end of this email. Please
excuse the comment with my name but it will allow me to find my fix in my
copy of Perl should I need to in the future.

I added the following call to sv_clear() in sv.c:

        save_re_context();

just prior to the call to the object's DESTROY method. This allows regular
expressions to be used in the DESTROY method of whatever object is being
used as a positional variable in a regular expression (a Unicode string?).

I agree with the comments above the save_re_context() function that this
is a really kludgey way for the regular expression engine to be written.
The engine should store its state in a stack or allocated structure that
is passed to its subroutines and not use global variables. I am afraid
that there may be other sections in Perl that will also require calls to
save_re_context() that have not been discovered yet.

Thanks for getting my fix in to the Perl distribution.

Sincerely,

Geoff Mottram
geoff@gate.net

--------------Output of perl -V ------------

Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=linux, osvers=2.4.21-27.0.2.elsmp, archname=i686-linux
    uname='linux classificationweb.net 2.4.21-27.0.2.elsmp #1 smp wed jan
12 23:35:44 est 2005 i686 i686 i386 gnulinux '
    config_args='-de'
    hint=previous, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING',
    optimize='-g',
    cppflags='-fno-strict-aliasing -pipe -I/usr/local/include
-I/usr/include/gdbm -fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING'
    ccversion='', gccversion='3.2.3 20030502 (Red Hat Linux 3.2.3-49)',
gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'


Characteristics of this binary (from libperl):
  Compile-time options: DEBUGGING USE_LARGE_FILES
  Built under linux
  Compiled at Sep  7 2005 06:10:19
  @INC:
    /usr/local/lib/perl5/5.8.7/i686-linux
    /usr/local/lib/perl5/5.8.7
    /usr/local/lib/perl5/site_perl/5.8.7/i686-linux
    /usr/local/lib/perl5/site_perl/5.8.7
    /usr/local/lib/perl5/site_perl
    .
--------------------------------------------

------------Output of perbug -d -------------

---
Flags:
    category=
    severity=
---
Site configuration information for perl v5.8.7:

Configured by minaret at Wed Sep  7 06:08:36 EDT 2005.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=linux, osvers=2.4.21-27.0.2.elsmp, archname=i686-linux
    uname='linux classificationweb.net 2.4.21-27.0.2.elsmp #1 smp wed jan
12 23:35:44 est 2005 i686 i686 i386 gnulinux '
    config_args='-de'
    hint=previous, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING',
    optimize='-g',
    cppflags='-fno-strict-aliasing -pipe -I/usr/local/include
-I/usr/include/gdbm -fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING'
    ccversion='', gccversion='3.2.3 20030502 (Red Hat Linux 3.2.3-49)',
gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:


---
@INC for perl v5.8.7:
    /usr/local/lib/perl5/5.8.7/i686-linux
    /usr/local/lib/perl5/5.8.7
    /usr/local/lib/perl5/site_perl/5.8.7/i686-linux
    /usr/local/lib/perl5/site_perl/5.8.7
    /usr/local/lib/perl5/site_perl
    .

---
Environment for perl v5.8.7:
    HOME=/minaret
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)

PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin/bin:/usr/local/java/jdk1.5.0_04/bin:/usr/local/java/jdk1.5.0_04/jre/bin:/minaret/m/bin:/minaret/m/util:/minaret/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

----------------------------------------------

----- gdb backtrace showing how PL_regeol gets clobbered by
Perl_re_intuit_start-------------------------

#0  Perl_re_intuit_start (prog=0x90e5070, sv=0x8fda2a0, strpos=0x9727d28
"utf8::DESTROY", strend=0x9727d35 "", flags=2, data=0x0)
    at regexec.c:454
#1  0x080ca09f in Perl_pp_match () at pp_hot.c:1328
#2  0x080b288d in Perl_runops_debug () at dump.c:1452
#3  0x08063509 in S_call_body (myop=0xbfffa050, is_eval=0) at perl.c:2364
#4  0x0806318f in Perl_call_sv (sv=0x95d17b0, flags=150) at perl.c:2282
#5  0x080d9de2 in Perl_sv_clear (sv=0x972d270) at sv.c:5110
#6  0x080da66c in Perl_sv_free (sv=0x972d270) at sv.c:5356
#7  0x080da24c in Perl_sv_clear (sv=0x978cf10) at sv.c:5208
#8  0x080da66c in Perl_sv_free (sv=0x978cf10) at sv.c:5356
#9  0x080c3800 in Perl_av_store (av=0x95509a0, key=1, val=0x9702280) at
av.c:335
#10 0x0812e348 in Perl_regclass_swash (node=0x9552c84, doinit=1 '\001',
listsvp=0x0, altsvp=0xbfffa27c) at regexec.c:4370
#11 0x0812e551 in S_reginclass (n=0x9552c84, p=0x9760c01 "3279.K86",
lenp=0x0, do_utf8=1 '\001') at regexec.c:4420
#12 0x0812d438 in S_regrepeat (p=0x9552c84, max=2147483647) at regexec.c:4112
#13 0x0812c4ae in S_regmatch (prog=0x9552c74) at regexec.c:3774
#14 0x08126f7c in S_regtry (prog=0x9552c30, startpos=0x9760c00
"B3279.K86") at regexec.c:2204
#15 0x08125a61 in Perl_regexec_flags (prog=0x9552c30, stringarg=0x9760c00
"B3279.K86", strend=0x9760c09 "", strbeg=0x9760c00 "B3279.K86",
    minend=0, sv=0x95507fc, data=0x0, flags=3) at regexec.c:1753
#16 0x080ca120 in Perl_pp_match () at pp_hot.c:1340
#17 0x080b288d in Perl_runops_debug () at dump.c:1452
#18 0x08062878 in S_run_body (oldscope=1) at perl.c:1995
#19 0x08062431 in perl_run (my_perl=0x8fda008) at perl.c:1919
#20 0x0805e81b in main (argc=20, argv=0xbfffa914, env=0xbfffa968) at
perlmain.c:98
(gdb) p *prog
$111 = {startp = 0x90dfb48, endp = 0x90e4e88, regstclass = 0x0, substrs =
0x90e1c88, precomp = 0x90e5018 "^threads::new$", data = 0x0,
  subbeg = 0x0, offsets = 0x90e5030, sublen = 0, refcnt = 1, minlen = 12,
prelen = 14, nparens = 0, lastparen = 0, lastcloseparen = 0,
  reganch = 137363457, program = {{flags = 156 '\234', type = 0 '\0',
next_off = 0}}}

---------------- Patch for sv.c to fix problem ---------------

--- sv.c.orig    2005-05-27 06:38:11.000000000 -0400
+++ sv.c    2005-09-07 19:31:19.000000000 -0400
@@ -5102,6 +5102,8 @@
             SV* tmpref = newRV(sv);
                 SvREADONLY_on(tmpref);   /* DESTROY() could be naughty */
             ENTER;
+            /* Next line added by Geoff Mottram 9/7/05. */
+            save_re_context();
             PUSHSTACKi(PERLSI_DESTROY);
             EXTEND(SP, 2);
             PUSHMARK(SP);

---------------------------------------------------------------

Technical Tips