McJohnson McJohnson - 2 months ago 8
PHP Question

Invoking function inside a loop leads to a memory_limit exceeded

I am processing somewhat big sized files in PHP (300 MB - 1024 MB) in pursue of finding a line that matches my search criteria and return the whole line. Since I cannot afford to read the entire file and store it in memory, I am reading line-by-line:

function getLineWithString($fileName, $str) {

$matches = array();
$handle = @fopen($fileName, "r");

if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle, 4096);

if (strpos($buffer, $str) !== FALSE) {
return '<pre>'.$matches[] = $buffer.'</pre>';
}
}

fclose($handle);
}
}


Since my
$str
(needle) is an array, I am using
foreach()
to process all its elements and invoke the function every time:

foreach ($my_array as $a_match) {
echo getLineWithString($myFile, trim($a_match));
}


However, this approach (using
foreach()
) hits the max_execution_time, memory_limit, Apache's FcgidIOTimeout and others. My array (needle) contains 88 elements and they might grow in number depending on the enduser's actions so this is definitely not an adequate way.

My question is how can I prevent the usage of
foreach()
or any other looping and invoke the function only once?

Answer

Note about memory leak

It's important to note that this is a misuse of the term memory leak since in PHP you have no control over memory management. A memory leak is generally defined as a process having allocated memory on the system that is no longer reachable by that process. It's not possible for you to do this in your PHP code since you have no direct control over the PHP memory manager.

Your code runs inside of the PHP virtual machine which manages memory for you. Exceeding the memory_limit you set in PHP is not the same thing as PHP leaking memory. This is a defined limit, controlled by you. You can raise or lower this limit at your discretion. You may even ask PHP to not limit the amount of memory at all by setting memory_limit = -1, for example. Of course, this is still subject to your machine's memory capacity.

Your actual problem

However, the approach you are using is not much better than reading the entire file into memory because you will have to read the file line by line with every search (call to your function). That's worse time complexity even though it may be more efficient in terms of memory.

To be efficient in both time and memory complexity you need to perform the search on each needle at once while reading from the file. Rather than send a single needle to your function, consider sending the entire array of needles at once. This way you defer the loop you're using to call your function to the function itself.

Additionally, you should note that your current function returns immediately, upon finding a match, since you're using return inside your loop. You should instead use return $matches at the end of your function, outside of the loop.

Here's a better approach.

function getLineWithString($fileName, Array $needles) {

    $matches = [];
    $handle = fopen($fileName, "r");

    if ($handle) {
        while (!feof($handle)) {
            $buffer = fgets($handle);

            foreach($needles as $str) {
                if (strpos($buffer, $str) !== FALSE) {
                    $matches[] = $buffer;
                }
            }
        }

        fclose($handle);
        return $matches;
    }

}

N.B

I'd strongly advice against using the error silence operator @ since it will effectively make debugging your code when there is a problem more difficult, because it turns off all error reporting for its operand. Even if there was an error PHP won't tell you about it, which isn't useful at all.

Comments