Let me describe my doubt. I have system where I have three entities, Doctor, Patient and Appointment. An appointment has the doctor's id and patient Id.
I need now to retrieve all the patients which have an appointment with a concrete doctor, and I'm not sure what will be faster, a distinct or a subselect for the id's, these are the queries:
SELECT DISTINCT patient.id, patient.name, patient.surname FROM
appointment INNER JOIN patient ON patient.id = appointment.patientid WHERE
appointment.doctorid = @id;
SELECT patient.id, patient.name, patient.surname FROM patient
WHERE patient.id IN (select appointment.patientid FROM appointment
WHERE appointment.doctorid = @id);
As with any performance question, you should test on your data and your hardware. The suspect problem in the first version the
DISTINCT after the
JOIN; this can require a lot of extra processing.
You can write the second as:
SELECT p.id, p.name, p.surname FROM patient p WHERE p.id IN (select a.patientid FROM appointment a WHERE a.doctorid = @id);
For this, you want an index on
You might consider this version as well:
select p.id, p.name, p.surname from patient p join (select distinct appointment.patientid from appointment where appointment.doctorid = @id ) a on p.id = a.patientid;
This specifically wants the same index. This pushes the
distinct so it is only operating on a single table, meaning that MySQL may be able to use the index for that operation.
And this one:
SELECT p.id, p.name, p.surname FROM patient p WHERE EXISTS (select 1 from appointment a where a.doctorid = @id and a.patientid = p.id );
This query wants an index on
appointment(patientid, doctorid). It requires a full table scan of
patient with a fast index lookup on each row. That could often be the fastest approach, depending on the data.
Note: which query performs better may also depends on the size and distribution of the data.