fire fire - 2 months ago 13
MySQL Question

Problem: Writing a MySQL parser to split JOIN's and run them as individual queries (denormalizing the query dynamically)

I am trying to figure out a script to take a MySQL query and turn it into individual queries, i.e. denormalizing the query dynamically.

As a test I have built a simple article system that has 4 tables:


  • articles

    • article_id

    • article_format_id

    • article_title

    • article_body

    • article_date


  • article_categories

    • article_id

    • category_id


  • categories

    • category_id

    • category_title


  • formats

    • format_id

    • format_title




An article can be in more than one category but only have one format. I feel this is a good example of a real-life situation.

On the category page which lists all of the articles (pulling in the format_title as well) this could be easily achieved with the following query:

SELECT articles.*, formats.format_title
FROM articles
INNER JOIN formats ON articles.article_format_id = formats.format_id
INNER JOIN article_categories ON articles.article_id = article_categories.article_id
WHERE article_categories.category_id = 2
ORDER BY articles.article_date DESC


However the script I am trying to build would receive this query, parse it and run the queries individually.

So in this category page example the script would effectively run this (worked out dynamically):

// Select article_categories
$sql = "SELECT * FROM article_categories WHERE category_id = 2";
$query = mysql_query($sql);
while ($row_article_categories = mysql_fetch_array($query, MYSQL_ASSOC)) {

// Select articles
$sql2 = "SELECT * FROM articles WHERE article_id = " . $row_article_categories['article_id'];
$query2 = mysql_query($sql2);
while ($row_articles = mysql_fetch_array($query2, MYSQL_ASSOC)) {

// Select formats
$sql3 = "SELECT * FROM formats WHERE format_id = " . $row_articles['article_format_id'];
$query3 = mysql_query($sql3);
$row_formats = mysql_fetch_array($query3, MYSQL_ASSOC);

// Merge articles and formats
$row_articles = array_merge($row_articles, $row_formats);

// Add to array
$out[] = $row_articles;
}
}

// Sort articles by date
foreach ($out as $key => $row) {
$arr[$key] = $row['article_date'];
}

array_multisort($arr, SORT_DESC, $out);

// Output articles - this would not be part of the script obviously it should just return the $out array
foreach ($out as $row) {
echo '<p><a href="article.php?id='.$row['article_id'].'">'.$row['article_title'].'</a> <i>('.$row['format_title'].')</i><br />'.$row['article_body'].'<br /><span class="date">'.date("F jS Y", strtotime($row['article_date'])).'</span></p>';
}


The challenges of this are working out the correct queries in the right order, as you can put column names for SELECT and JOIN's in any order in the query (this is what MySQL and other SQL databases translate so well) and working out the information logic in PHP.

I am currently parsing the query using SQL_Parser which works well in splitting up the query into a multi-dimensional array, but working out the stuff mentioned above is the headache.

Any help or suggestions would be much appreciated.

Answer

I agree it sounds like a bad choice, but I can think of some situations where splitting a query could be useful.

I would try something similar to this, relying heavily on regular expressions for parsing the query. It would work in a very limited of cases, but it's support could be expanded progressively when needed.

<?php
/**
 * That's a weird problem, but an interesting challenge!
 * @link http://stackoverflow.com/questions/5019467/problem-writing-a-mysql-parser-to-split-joins-and-run-them-as-individual-query
 */

// Taken from the given example:
$sql = "SELECT articles.*, formats.format_title 
FROM articles 
INNER JOIN formats ON articles.article_format_id = formats.format_id 
INNER JOIN article_categories ON articles.article_id = article_categories.article_id 
WHERE article_categories.category_id = 2 
ORDER BY articles.article_date DESC";

// Parse query
// (Limited to the clauses that are present in the example...)
// Edit: Made WHERE optional
if(!preg_match('/^\s*'.
    'SELECT\s+(?P<select_rows>.*[^\s])'. 
    '\s+FROM\s+(?P<from>.*[^\s])'.
    '(?:\s+WHERE\s+(?P<where>.*[^\s]))?'.
    '(?:\s+ORDER\s+BY\s+(?P<order_by>.*[^\s]))?'.
    '(?:\s+(?P<desc>DESC))?'.
    '(.*)$/is',$sql,$query)
) {
    trigger_error('Error parsing SQL!',E_USER_ERROR);
    return false;
}

## Dump matches
#foreach($query as $key => $value) if(!is_int($key)) echo "\"$key\" => \"$value\"<br/>\n";

/* We get the following matches:
"select_rows" => "articles.*, formats.format_title"
"from" => "articles INNER JOIN formats ON articles.article_format_id = formats.format_id INNER JOIN article_categories ON articles.article_id = article_categories.article_id"
"where" => "article_categories.category_id = 2"
"order_by" => "articles.article_date"
"desc" => "DESC"
/**/

// Will only support WHERE conditions separated by AND that are to be
// tested on a single individual table.
if(@$query['where']) // Edit: Made WHERE optional
    $where_conditions = preg_split('/\s+AND\s+/is',$query['where']);

// Retrieve individual table information & data
$tables = array();
$from_conditions = array();
$from_tables = preg_split('/\s+INNER\s+JOIN\s+/is',$query['from']);

foreach($from_tables as $from_table) {

    if(!preg_match('/^(?P<table_name>[^\s]*)'.
        '(?P<on_clause>\s+ON\s+(?P<table_a>.*)\.(?P<column_a>.*)\s*'.
        '=\s*(?P<table_b>.*)\.(?P<column_b>.*))?$/im',$from_table,$matches)
    ) {
        trigger_error("Error parsing SQL! Unexpected format in FROM clause: $from_table", E_USER_ERROR);
        return false;
    }
    ## Dump matches
    #foreach($matches as $key => $value) if(!is_int($key)) echo "\"$key\" => \"$value\"<br/>\n";

    // Remember on_clause for later jointure
    // We do assume each INNER JOIN's ON clause compares left table to
    // right table. Forget about parsing more complex conditions in the
    // ON clause...
    if(@$matches['on_clause'])
        $from_conditions[$matches['table_name']] = array(
            'column_a' => $matches['column_a'],
            'column_b' => $matches['column_b']
        );

    // Match applicable WHERE conditions
    $where = array();
    if(@$query['where']) // Edit: Made WHERE optional
    foreach($where_conditions as $where_condition)
        if(preg_match("/^$matches[table_name]\.(.*)$/",$where_condition,$matched))
            $where[] = $matched[1];
    $where_clause = empty($where) ? null : implode(' AND ',$where);

    // We simply ignore $query[select_rows] and use '*' everywhere...
    $query = "SELECT * FROM $matches[table_name]".($where_clause? " WHERE $where_clause" : '');
    echo "$query<br/>\n";

    // Retrieve table's data
    // Fetching the entire table data right away avoids multiplying MySQL
    // queries exponentially...
    $table = array();
    if($results = mysql_query($table))
        while($row = mysql_fetch_array($results, MYSQL_ASSOC))
            $table[] = $row;

    // Sort table if applicable
    if(preg_match("/^$matches[table_name]\.(.*)$/",$query['order_by'],$matched)) {
        $sort_key = $matched[1];

        // @todo Do your bubble sort here!

        if(@$query['desc']) array_reverse($table);
    }

    $tables[$matches['table_name']] = $table;
}

// From here, all data is fetched.
// All left to do is the actual jointure.

/**
 * Equijoin/Theta-join.
 * Joins relation $R and $S where $a from $R compares to $b from $S.
 * @param array $R A relation (set of tuples).
 * @param array $S A relation (set of tuples).
 * @param string $a Attribute from $R to compare.
 * @param string $b Attribute from $S to compare.
 * @return array A relation resulting from the equijoin/theta-join.
 */
function equijoin($R,$S,$a,$b) {
    $T = array();
    if(empty($R) or empty($S)) return $T;
    foreach($R as $tupleR) foreach($S as $tupleS)
        if($tupleR[$a] == @$tupleS[$b])
            $T[] = array_merge($tupleR,$tupleS);
    return $T;
}

$jointure = array_shift($tables);
if(!empty($tables)) foreach($tables as $table_name => $table)
    $jointure = equijoin($jointure, $table,
        $from_conditions[$table_name]['column_a'],
        $from_conditions[$table_name]['column_b']);

return $jointure;

?>

Good night, and Good luck!

Comments