MPetrovic MPetrovic - 23 days ago 7
MySQL Question

Join csv files and sort to look like result from SQL query

I have mysql database, and i run sql query on that database with joins and where clause, lets say query will look like

SELECT a.value1,
a.value2,
b.value3,
c.value4
FROM table1 a
LEFT JOIN table2 b ON a.some_value=b.some_value
LEFT JOIN table3 c ON a.some_value=c.some_value
WHERE a.some_value = 'some_text';


and i run query 1000times to get average time needed for this query on database with 4.5milions records.

But, i need help if it is possible
because i don't have idea is it possible and if it is possible, how to do, and where to start for next steps.

That steps are: i want to export in CSV files
table1, table2 and table3 (and that is ok, i know to export tables in csv file), after that, in PHP i want to load each csv file, and to get almost exact result as this SQL query give.
Also i want to run this query 1.000 times and to get average time. I want to do these to make comparison between running query on tables and sorting from csv files.

Answer Source

If you really have to put yourself through this pain, at least it's a fairly simple query.

Something like

$a = fopen('a.csv', 'r');
$b = fopen('b.csv', 'r');
$c = fopen('c.csv', 'r');
$output = fopen('php://output', 'w');

while (($dataA = fgetcsv($a, 1024, ",")) !== false) {
    if ($dataA[0] == 'some_text') {
        $bCount = 0;
        rewind($b);
        while (($dataB = fgetcsv($b, 1024, ",")) !== false) {
            if ($dataB[0] == $dataA[3]) {
                $bCount = 0;
                $cCount = 0;
                rewind($c);
                while (($dataC = fgetcsv($c, 1024, ",")) !== false) {
                    if ($dataC[0] == $dataA[4]) {
                        fputcsv($output, [$dataA[1], $dataA[2], $dataB[1], $dataC[1]]);
                        ++$cCount;
                    }
                }
                if ($cCount == 0) {
                    fputcsv($output, [$dataA[1], $dataA[2], $dataB[1], null]);
                }
                ++$bCount;
            }
        }
        if ($bCount == 0) {
            $cCount = 0;
            rewind($c);
            while (($dataC = fgetcsv($c, 1024, ",")) !== false) {
                if ($dataC[0] == $dataA[4]) {
                    fputcsv($output, [$dataA[1], $dataA[2], null, $dataC[1]]);
                    ++$cCount;
                }
            }
            if ($cCount == 0) {
                fputcsv($output, [$dataA[1], $dataA[2], null, null]);
            }
        }
    }
}

would be pretty close to what you want.

Modify your row IDs to suit your query. If you want to "name" your cells in each row to make the code more "readable", then that's an overhead, and this will take long enough to run anyway


EDIT

Logically, it loops through file a, testing each row in turn to see if it matches your where criteria, ignoring any row that doesn't match and moving on to the next. If it does find a match, then it does a lookup against file b, looping through until it finds any match (the if test there). If it does find a match against b, it then loops against c. If an match is found against c (an if test again), then it displays the result (writing the value from your select as a comma separated list from all 3 files). If no match is found in c, then it displays the values from a and b with a null instead of any value from c. If no matching value is found in b, then it loops against c looking for a match, and displays any matches from a and c with a null for the select value from b. If no matching values re found in either b or c, then it displays the a values with nulls for the b and c cells. The rewinds for b and c loops ensure that the loops will always check from the beginning of the file.


There's no easy way to search a csv other than iterating through the entire file. You could loop through each file a single time caching results against an index to make subsequent loops easier if you had unlimited memory, but PHP isn't the right language for memory-intensive tasks.

Of course, if had unlimited memory and could load the entirety of each file into 3 arrays, then you could use a query tool like LINQ to handle the query