penname penname - 6 months ago 8
PHP Question

PHP - Create new file containing all lines from file1 that do not contain any of the text from lines in file2

I've read a load of posts on StackExchange but can't find exactly what I need. Note: this is not just about removing duplicates. I need to go through File1.csv and create a new file - Results.csv - with every line it contains that doesn't contain a line from File2.txt.

File1.csv contains personal details and email addresses, 1 per line:

"mr","Happy","Man","mrhappy@example.com"
"mr","Sad","Man","mrsad@example.com"
"mr","Grumpy","Man","mrgrumpy@example.com"
"mr","Strong","Man","mrstrong@example.com"


File2.txt contains email addresses, 1 per line:

mrhappy@example.com
mrsomeoneelse@example.com
mrsomeoneelse2@example.com


Expected result: Results.csv should contain:

"mr","Sad","Man","mrsad@example.com"
"mr","Grumpy","Man","mrgrumpy@example.com"
"mr","Strong","Man","mrstrong@example.com"


Confusingly, the code I have works as expected when File2.txt contains a single line. But when it contains more than one line Results.txt contains all lines from File1.csv (including lines that should have been removed) and repeats those lines multiple times (as many times as there are lines in File2.txt). I've got a feeling I'm close but I can't figure it out.

My code:

<?php
$to_be_searched = "File1.csv";

$items_to_catch = file("File2.txt");

// create empty array to store lines we want to keep - i.e. lines that dont contain emails we're checking for
$good_lines = array();

// open $to_be_searched
$handle = fopen($to_be_searched, "r");
if ($handle) {
// go line by line until end of file
while (($line = fgets($handle)) !== false) {
// check if line contains any items from $items_to_catch
foreach($items_to_catch as $key => $value) {
if(strpos($line, $value) === false) {
// email wasn't found on the line so we want this line in the results file, therefore add to $good_lines array
$good_lines[] = $line;
}
}
}
fclose($handle);
} else {
echo "Couldn't open " . $to_be_searched;
exit();
}

// write $array_of_good_lines into new file
$new_file = "Results.csv";
foreach($good_lines as $key => $value) {
file_put_contents($new_file, $value, FILE_APPEND | LOCK_EX);
}

?>


What am I doing wrong?

Answer

It's not working currently because in your foreach, you're adding the same line multiple times to $good_lines.

To fix this, you can add a flag variable to your loop.

while (($line = fgets($handle)) !== false) {
    // Declare our flag variable as false by default
    $found = false;

    // Loop through each item to see if the email has been found
    foreach($items_to_catch as $key => $value) {
        // If the email was found, stop looping in the second file
        if(strpos($line, $value) !== false){
            $found = true;
            break;
        } 
    }

    // If the email was not found in the second file, add it to the good_lines array
    if(!$found)
        $good_lines[] = $line;
}

Update

Beside the loop, you have another problem when you're reading the File2.txt, since it's adding the line breaks into the string, therefore, when you compare the string later on with strpos, it's not working. To fix that:

$items_to_catch = file("File2.txt", FILE_IGNORE_NEW_LINES);

This is the var_dump of the $items_to_catch without the flag:

array (size=3)
    0 => string 'mrhappy@example.com
    ' (length=20)
    1 => string 'mrsomeoneelse@example.com
    ' (length=26)
    2 => string 'mrsomeoneelse2@example.com
    ' (length=27)

This is the var_dump of the $items_to_catch with the flag:

array (size=3)
    0 => string 'mrhappy@example.com' (length=19)
    1 => string 'mrsomeoneelse@example.com' (length=25)
    2 => string 'mrsomeoneelse2@example.com' (length=26)

Notice the extra character in each of the emails, which is the line break.

Comments