Thorsten Schöning Thorsten Schöning - 2 months ago 9
Perl Question

Perl: When is unneeded memory of a scalar freed without going out of scope?

I have an app which reads a giant chunk of textual data into a scalar, sometimes even GBs in size. I use

substr
on that scalar to read most of the data into another scalar and replace the extracted data with an empty string, because it is not needed in the first scalar anymore. What I've found recently was that Perl is not freeing the memory of the first scalar, while it is recognizing that its logical length has changed. So what I need to do is extract the data from the first scalar into a third again,
undef
the first scalar und put the extracted data back in place. Only this way the memory occupied by the first scalar is really freed up. Assigning undef to that scalar or some other value less than the allocated block of memory doesn't change anything about the allocated memory.

The following is what I do now:

$$extFileBufferRef = substr($$contentRef, $offset, $length, '');
$length = length($$contentRef);
my $content = substr($$contentRef, 0, $length);
$$contentRef = undef( $$contentRef) || $content;


$$contentRef
might be e.g. 5 GBs in size in the first line, I extract 4,9 GB of data and replace the extracted data. The second line would now report e.g. 100 MBs of data as the length of the string, but e.g.
Devel::Size::total_size
would still output that 5 GB of data are allocated for that scalar. And assigning
undef
or such to
$$contentRef
doesn't seem to change a thing about that, I need to call
undef
as a function on that scalar.

I would have expected that the memory behind
$$contentRef
is already at least partially freed after
substr
was applied. Doesn't seem to be the case...

So, is memory only freed if variables go out of scope? And if so, why is assigning
undef
different to calling
undef
as a function on the same scalar?

Answer

Your analysis is correct.

$ perl -MDevel::Peek -e'
   my $x; $x .= "x" for 1..100;
   Dump($x);
   substr($x, 50, length($x), "");
   Dump($x);
'
SV = PV(0x24208e0) at 0x243d550
  ...
  CUR = 100       # length($x) == 100
  LEN = 120       # 120 bytes are allocated for the string buffer.

SV = PV(0x24208e0) at 0x243d550
  ...
  CUR = 50        # length($x) == 50
  LEN = 120       # 120 bytes are allocated for the string buffer.

Not only does Perl overallocate strings, it doesn't even free variables that go out of scope, instead reusing them the next time the scope is entered.

$ perl -MDevel::Peek -e'
   sub f {
      my ($set) = @_;
      my $x;
      if ($set) { $x = "abc"; $x .= "def"; }
      Dump($x);
   }

   f(1);
   f(0);
'
SV = PV(0x3be74b0) at 0x3c04228   # PV: Scalar may contain a string
  REFCNT = 1
  FLAGS = (POK,pPOK)              # POK: Scalar contains a string
  PV = 0x3c0c6a0 "abcdef"\0       # The string buffer
  CUR = 6
  LEN = 10                        # Allocated size of the string buffer

SV = PV(0x3be74b0) at 0x3c04228   # Could be a different scalar at the same address,
  REFCNT = 1                      #   but it's truly the same scalar
  FLAGS = ()                      # No "OK" flags: undef
  PV = 0x3c0c6a0 "abcdef"\0       # The same string buffer
  CUR = 6
  LEN = 10                        # Allocated size of the string buffer

The logic is that if you needed the memory once, there's a strong chance you'll need it again.

For the same reason, assigning undef to a scalar doesn't free its string buffer. But Perl give a chance to free the buffers if you wanted, so passing a scalar to undef does force the freeing of the scalar's internal buffers.

$ perl -MDevel::Peek -e'
   my $x = "abc"; $x .= "def";  Dump($x);
   $x = undef;                  Dump($x);
   undef $x;                    Dump($x);
'
SV = PV(0x37d1fb0) at 0x37eec98   # PV: Scalar may contain a string
  REFCNT = 1
  FLAGS = (POK,pPOK)              # POK: Scalar contains a string
  PV = 0x37e8290 "abcdef"\0       # The string buffer
  CUR = 6
  LEN = 10                        # Allocated size of the string buffer

SV = PV(0x37d1fb0) at 0x37eec98   # PV: Scalar may contain a string
  REFCNT = 1
  FLAGS = ()                      # No "OK" flags: undef
  PV = 0x37e8290 "abcdef"\0       # The string buffer is still allcoated
  CUR = 6
  LEN = 10                        # Allocated size of the string buffer

SV = PV(0x37d1fb0) at 0x37eec98   # PV: Scalar may contain a string
  REFCNT = 1
  FLAGS = ()                      # No "OK" flags: undef
  PV = 0                          # The string buffer has been freed.
Comments