hungrykoala hungrykoala - 4 months ago 8
SQL Question

Wrong ID is retrieve during a select query with 6 joined tables

This are my tables:

cb_paymentscheduledetail

id | name | date_entered | deleted | due_date | amount | status
1 | #1 Payment For Inv# 1 | 2016-07-15 06:11:55 | 0 | 2016-07-08 | 147.25 | Unpaid
2 | #2 Payment For Inv# 1 | 2016-07-15 06:11:55 | 0 | 2016-07-15 | 147.25 | Unpaid
3 | #1 Payment For Inv# 3 | 2016-07-14 13:00:21 | 0 | 2016-07-18 | 4.58 | Unpaid
4 | #2 Payment For Inv# 3 | 2016-07-14 13:00:21 | 0 | 2016-07-21 | 4.58 | Unpaid
5 | #1 Payment For Inv# 2 | 2016-07-14 12:56:35 | 0 | 2016-07-22 | 50.00 | Unpaid
6 | #3 Payment For Inv# 1 | 2016-07-15 06:11:55 | 0 | 2016-07-22 | 147.25 | Unpaid


cb_paymentscheduleheader

id | installment_type
1 | auto
2 | auto
3 | manual


cb_paymentscheduleheader_cb_paymentscheduledetail_c

id | cb_payment37a2eheader_ida | cb_paymente42dedetail_idb
1 | 1 | 5
2 | 2 | 6
3 | 2 | 1
4 | 2 | 2
5 | 3 | 3
6 | 3 | 4


aos_invoices

id | number | billing_account_id
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3


accounts

id | phone_office | name
1 | 123 | a
2 | 123 | b
3 | 123 | c


email_addr_bean_rel

email_address_id | bean_id
1 | 1
2 | 2
3 | 3


email_addresses

id | email_address
1 | test@test.com
2 | test@test3.com
3 | test@test4.com


My Query:

SELECT cb_paymentscheduledetail.id , cb_paymentscheduledetail.amount , cb_paymentscheduledetail.assigned_user_id
,MIN(cb_paymentscheduledetail.due_date) as min_date, cb_paymentscheduledetail.name,
cb_paymentscheduledetail.amount, aos_invoices.number, aos_invoices.billing_account_id,
accounts.id as account_id, accounts.phone_office as account_phone, accounts.name as account_name,
email_addr_bean_rel.email_address_id, email_addr_bean_rel.bean_id, email_addresses.email_address as account_email,
cb_paymentscheduleheader_cb_paymentscheduledetail_c.id as headerdetail_id,
cb_paymentscheduleheader_cb_paymentscheduledetail_c.cb_paymente42dedetail_idb,
cb_paymentscheduleheader_cb_paymentscheduledetail_c.cb_payment37a2eheader_ida, cb_paymentscheduleheader.id as header_id,
cb_paymentscheduleheader.installment_type
FROM cb_paymentscheduledetail
LEFT JOIN aos_invoices ON aos_invoices.number = SUBSTRING_INDEX(cb_paymentscheduledetail.name, ' ', -1)
INNER JOIN accounts ON aos_invoices.billing_account_id = accounts.id
INNER JOIN email_addr_bean_rel ON accounts.id = email_addr_bean_rel.bean_id
INNER JOIN email_addresses ON email_addresses.id = email_addr_bean_rel.email_address_id
INNER JOIN cb_paymentscheduleheader_cb_paymentscheduledetail_c ON cb_paymentscheduledetail.id = cb_paymentscheduleheader_cb_paymentscheduledetail_c.cb_paymente42dedetail_idb
INNER JOIN cb_paymentscheduleheader ON cb_paymentscheduleheader_cb_paymentscheduledetail_c.cb_payment37a2eheader_ida = cb_paymentscheduleheader.id
WHERE cb_paymentscheduledetail.deleted = 0 AND cb_paymentscheduledetail.status = 'Unpaid' AND (cb_paymentscheduledetail.due_date BETWEEN '2016-07-15' AND '2016-07-22')
GROUP BY cb_paymentscheduledetail.date_entered
ORDER BY cb_paymentscheduledetail.due_date ASC;


My problem here is that the ID fetch by this query is not correct for amount 147.25 altough the min_date it fetch is correct:

the min_date queried is as follow:

4.58 = 2016-07-18
50.00 = 2016-07-22
147.25 = 2016-07-15


There ID should be this respectively:

4.58 = 3
50.00 = 5
147.25 = 2


But the actual ID is:

4.58 = 3
50.00 = 5
147.25 = 6


As you can see for amount "147.25" it is getting due_date 2016-07-22 when in the MIN function it got the correct one which is 2016-07-15, it is also getting the correct ID which should be 2 and not 6, Will anyone be able to help me with this as I cannot really see anymore what the problem is? since shouldn't it retrieve the appropriate ID based on my min_date when it got that correct from the start?

The only time 2016-07-15 is selected is when I change my BETWEEN to:

cb_paymentscheduledetail.due_date = '2016-07-15'


But I don't what this since I need to get the in between date

Answer

shouldn't it retrieve the appropriate ID based on my min_date when it got that correct from the start?

The short answer is No.

MySQL returns some values that seem random to you for all the expressions present in the SELECT clause that do not match one of the following:

  • the expression is also present in the GROUP BY clause;
  • the expression uses GROUP BY aggregate functions;
  • the expression is functionally dependent on the columns that appear in the GROUP BY clause.

It is a documented behaviour and it is how it should be because of the way the GROUP BY works.

GROUP BY does not select rows from the database. It generates records using the values from the database. Each expression that appears in the SELECT clause of a query that also contains a GROUP BY clause is computed independent of the other expressions.

Think a little about it. Assuming it should work as you want, what values should the query return if you replace MIN() with another GROUP BY aggregate function? With AVG(), for example. Most probably, there isn't any row in the cb_paymentscheduledetail table having in column due_date the value returned by AVG(due_date). Or COUNT()? They doesn't even have the same type.

Take a look at this answer provided on a similar question to learn the correct way to write a query that returns the row having the minimum/maximum value from its group in a certain column. It can be expanded to accommodate six tables by INNER JOIN-ing them with the table from which the row is selected (table o in that answer).

If you cannot handle it this way, you can split your query into two smaller queries (you can even merge them later using subqueries): one query on paymentscheduledetail that selects MIN(due_date) and groups by date_entered and another query that uses the value returned by the first query to select the desired row(s). Take into account that there can be more than one row that have in due_date the minimum value.

Comments