leandermelms leandermelms - 25 days ago 7
Bash Question

Fix number of arguments / parameters within parentheses of SQL script

I have a large SQL script file generated from XML. In some value insert rows, a incorrect number of values is passed, causing the error

Column count doesn't match value count
. I'd like to track down possible syntax errors. As the SQL script file contains 300k rows, I'd like to write a script for that.

Is there any way to check the numbers of values within a statement like:

INSERT INTO table (
one,
two,
three
)
VALUES (123, 'lorem', 'ipsum');


Any help is greatly appreciated.

Answer

SQL is very difficult to parse. If your data is pretty simple and your SQL is pretty regular, you might be able to get away with using awk in the way you're hoping, see next. Personally, I would probably inspect the database for inserted values, and scan the script for them, or vice-versa. Or insert a bunch of print statements and see where the error message is interposed.

Hoping for the best in awk, let's give it the old college try:

$ cat dat
INSERT INTO table (
 one,
 two,
 three
)
VALUES (123, 'lorem', 'ipsum');
INSERT INTO table (
 one,
 three
)
VALUES (123, 'lorem', 'ipsum');

$ tr -d \\n < dat | sed 's/;/&\
/g' | awk -F '[()]' 'split($2, cols, /, /) != split($4, vals, /, /) {print}'
INSERT INTO table ( one, three)VALUES (123, 'lorem', 'ipsum');

With tr, we delete the newlines. With sed, we put each SQL statement (ending with ;) on a line. With awk, we split each line using parentheses as delimiters, so that the columns are in $2 and the values are in $4. The split command returns how many fields each of them has, using the comma as a delimiter in both cases. If they don't match, print the line. The last line displayed is the output, because the column name two is missing.

This could return some false positives, which in your case might not be terrible. If the data has semicolons or commas, the splitting will be wrong. If the INSERT doesn't mention column names, it will be wrong. If there are non-insert statements, you'll have to filter them out, or deal with them differently.