ohho ohho - 7 months ago 13
SQL Question

How to optimize this SELECT?

I have one-to-many tables

Payment
and
PaymentFlows
to keep track of payment workflows.

For different managers, they are interested in certain workflows only. So whenever a payment reach a certain workflow, a list is provided to them.

For example,

Payment 1 - A) Apply
B) Checked
C) Approved by Manager
D) Approved by CFO
E) Cheque issued

Payment 2 - A) Apply
B) Checked
C) Approved by Manager

Payment 3 - A) Apply
B) Checked
C) Approved by Manager

Payment 4 - A) Apply
B) Checked


To show all payments at workflow
C
, what I did is:

class Payment < ActiveRecord::Base

def self.search_by_workflow(flow_code)
self.find_by_sql("SELECT * FROM payments P INNER JOIN (
SELECT payment_id FROM (
SELECT * FROM (
SELECT * FROM payment_flows F
ORDER BY F.payment_flow_id DESC
) latest GROUP BY payment_id
) flows WHERE flows.code = flow_code)
) IDs ON IDs.payment_id = P.payment_id ORDER BY P.payment_id DESC LIMIT 100;")
end

end


so:

@payments = Payment.search_by_workflow('Approved by Manager')


returns:
Payment 2
and
3


However, the performance is not very good (5 to 7 seconds for 15,000 payments and 55,000 workflows).

How can I improve the performance?

UPDATE (with table structures):

CREATE TABLE `payments` (
`payment_id` int(11) NOT NULL,
`payment_type_code` varchar(50) default 'PETTY_CASH',
`status` varchar(16) NOT NULL default '?',
PRIMARY KEY (`payment_id`),
KEY `status` (`status`),
KEY `payment_type_code` (`payment_type_code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `payment_flows` (
`payment_flow_id` int(11) NOT NULL,
`payment_id` int(11) default NULL,
`code` varchar(64) default NULL,
`status` varchar(255) NOT NULL default 'new',
PRIMARY KEY (`payment_flow_id`),
KEY `payment_id` (`payment_id`),
KEY `code` (`proc_code`),
KEY `status` (`status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Answer

Try this:

SELECT * FROM payment p
WHERE EXISTS(
    SELECT 'FLOW'
    FROM payment_flows pf
    WHERE pf.payment_id = p.payment_id
    AND pf.code = flow_code 
    AND NOT EXISTS(
        SELECT 'OTHER'
        FROM payment_flows pf2
        WHERE pf2.payment_id = pf.payment_id
        AND pf2.payment_flow_id > pf.payment_flow_id
    )
)

Pay attention: in the query flow_code is a variable with the code you want to search

I've added a main EXISTS condition about the presence of flow_code and a nested NOT EXISTS condition about the absence of other id of the same payment next about flow_code.

Tell me if is it OK about better performance.