Phong Hoang Phong Hoang - 1 month ago 5
MySQL Question

Mysql replace column value with other column value

I have 2 tables:

table: transaction:
====================
id billed_date amount
1 2016-09-30 5
2 2016-10-04 15
3 2016-10-06 10

table: report_date
====================
transaction_id report_date
1 2016-10-01


I want:


  • Create a report which sum all transactions's amount in October 2016

  • Base on report date, not billed date

  • When report date is not set, it's base on billed_date

  • In above example, I want result is 30 (not 25)



Then I write:

The First:

SELECT
sum(t.amount),
CASE WHEN d.report_date IS NOT NULL THEN d.report_date ELSE t.billed_date END AS new_date
FROM
transaction t LEFT JOIN report_date d ON t.id = d.transaction_id
WHERE new_date BETWEEN '2016-10-01' AND '2016-10-30'


The Second:

SELECT sum(amount) FROM
(SELECT t.amount,
CASE WHEN d.report_date IS NOT NULL THEN d.report_date ELSE t.billed_date END AS date
FROM transaction t LEFT JOIN report_date d ON t.id = d.transaction_id
) t
WHERE t.date BETWEEN '2016-10-01' AND '2016-10-30'


Result:

The First:


  • Unknown column 'new_date' in 'where clause'

  • If I replace 'new_date' by 'date': result = 25 (exclude id=1)



The Second:


  • result = 30 => Correct, but in my case, when transaction table have about 30k records, the process is too slow.



Anybody can help me?

Answer

First of all - the part

CASE WHEN d.report_date IS NOT NULL THEN d.report_date ELSE t.billed_date END

can be written shorter as

COALESCE(d.report_date, t.billed_date)

or as

IFNULL(d.report_date, t.billed_date)

In your first query you are using a column alias in the WHERE clause, wich is not allowed. You can fix it by moving the expression behind the alias to the WHERE clause:

SELECT sum(t.amount)
FROM transaction t LEFT JOIN report_date d ON t.id = d.transaction_id
WHERE COALESCE(d.report_date, t.billed_date) BETWEEN '2016-10-01' AND '2016-10-30'

This is almost the same as your own solution.

Your second query is slow because MySQL has to store the subquery result (30K rows) into a temporary table. Trying to optimize it, you will end up with the same solution above.

However if you have indexes on transaction.billed_date and report_date.report_date this query still can not use them. In order to use the indexes, you can split the query into two parts:

Entries with a report (will use report_date.report_date index):

SELECT sum(amount)
FROM transaction t JOIN report_date d ON id = transaction_id 
WHERE d.report_date BETWEEN '2016-10-01' AND '2016-10-30'

Entries without a report (will use transaction.billed_date index):

SELECT sum(amount)
FROM transaction t LEFT JOIN report_date d ON id = transaction_id 
WHERE d.report_date IS NULL AND t.billed_dateBETWEEN '2016-10-01' AND '2016-10-30'

Both queries can use an index. You just need to sum the results, wich can also be done combining the two queries:

SELECT (
    SELECT sum(amount)
    FROM transaction t JOIN report_date d ON id = transaction_id 
    WHERE d.report_date BETWEEN '2016-10-01' AND '2016-10-30'
) + (
    SELECT sum(amount)
    FROM transaction t LEFT JOIN report_date d ON id = transaction_id 
    WHERE d.report_date IS NULL AND t.billed_dateBETWEEN '2016-10-01' AND '2016-10-30'
) AS sum_amount