Arnoux Olivier Arnoux Olivier - 4 months ago 16
SQL Question

Update table if certain fields are null with related values from a different column

I am writing an ETL in kettle pentaho to create a table from various sources including google analytics.

so Table 1 = All data from website joined to google analytics information
Table 2 = All Duplicate data from Table 1 joined to google analytics information

my problem is that some info on table 1 has the google analytics information missing but table 2 shows some data for Google Analytics on the same reference_number

So what I want to do is lookup [reference_number] from table 1 to table 2 and populate table 1 where some columns are null from info on table 2

Quick example EDIT*

Table 1 (Main Table) * *This table has an index built in on website_reference number (Unique)*
website_Reference_number GA_info_1 GA_info_2
A1 null null
A2 x y

Table 2 (Duplicates from Table 1)
eventlabel GA_info_1 GA_info_2
A1 z z
A2 x y


my output should be the following

Table 1 (Main Table)
Ref_number GA_info_1 GA_info_2
A1 z z
A2 x y


I am using a My_SQL database

Answer
UPDATE mytable
LEFT JOIN mytable AS table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
    mytable.GA_info_1,
    table2.GA_info_1
),
 mytable.GA_info_2 = COALESCE (
    mytable.GA_info_2,
    table2.GA_info_2
)
WHERE
    mytable.GA_info_1 IS NULL
OR mytable.GA_info_2 IS NULL

Put all the fields which might be null into the where clause.

If the field is not null it will not be updated because it is the first argument in the coalesce function, if it is null it will be updated by the field of the other table.

edit: Also you can try it like this:

UPDATE mytable
INNER JOIN mytable AS table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
    mytable.GA_info_1,
    table2.GA_info_1
),
 mytable.GA_info_2 = COALESCE (
    mytable.GA_info_2,
    table2.GA_info_2
)
WHERE
    CONCAT(mytable.GA_info_1, mytable.GA_info_2) IS NULL

For the performance issue: (as already mentioned in the comments)

Since you are not using the primary or foreign keys to join the tables, you would have to set an index on your Ref_number columns to speed up the join.

Comments