henryzo henryzo - 7 months ago 9
SQL Question

postgresql join confusion

I'm trying to make this statement work. but I can't figure it out.



  1. List the names of the students who never took the course Databases.




I have this:

select distinct s1.name, e1.section_id
from students s1
inner join enrollment e1 on e1.student_id = s1.id
where e1.course_id != 12


but that doesn't remove the student who took that section, so I'm stuck.

Database looks like (I'm soo sorry, I don't know how to insert the database into here)

table students (
id integer primary key,
name varchar(255),
graduation_date date,
major_id integer references departments(id)
);

insert into students (id, name, graduation_date, major_id) values
(1, 'Joe', null, 10);
insert into students (id, name, graduation_date, major_id) values
(2, 'Amy', '2009-04-22', 20);
insert into students (id, name, graduation_date, major_id) values
(3, 'Max', null, 10);


create table courses (
id integer primary key,
title varchar(255),
units integer,
department_id integer references departments(id)
);

insert into courses (id, title, units, department_id) values
(12, 'Databases', 4, 10);
insert into courses (id, title, units, department_id) values
(22, 'Compilers', 4, 10);
insert into courses (id, title, units, department_id) values
(32, 'Calculus 1', 4, 20);

create table sections (
id integer primary key,
course_id integer not null references courses(id),
instructor_id integer references faculty(id),
year integer
);

insert into sections (id, course_id, instructor_id, year) values
(12, 12, 6, 2007);
insert into sections (id, course_id, instructor_id, year) values
(13, 12, 1, 2008);
insert into sections (id, course_id, instructor_id, year) values
(14, 22, 1, 2008);
insert into sections (id, course_id, instructor_id, year) values
(23, 12, 6, 2009);

create table enrollment (
id integer primary key,
student_id integer not null references students(id),
section_id integer not null references sections(id),
grade_id integer references grades(id)
);

insert into enrollment (id, student_id, section_id, grade_id) values
(14, 1, 12, 8);
insert into enrollment (id, student_id, section_id, grade_id) values
(15, 1, 13, 3);
insert into enrollment (id, student_id, section_id, grade_id) values
(16, 1, 14, 5);
insert into enrollment (id, student_id, section_id, grade_id) values
(17, 1, 32, 1);
insert into enrollment (id, student_id, section_id, grade_id) values
(18, 1, 34, 2);
insert into enrollment (id, student_id, section_id, grade_id) values
(19, 1, 53, 13);
insert into enrollment (id, student_id, section_id, grade_id) values
(24, 3, 12, 2);
insert into enrollment (id, student_id, section_id, grade_id) values
(25, 3, 14, 5);
insert into enrollment (id, student_id, section_id, grade_id) values
(26, 3, 32, 1);
insert into enrollment (id, student_id, section_id, grade_id) values
(27, 3, 34, 2);
insert into enrollment (id, student_id, section_id, grade_id) values
(28, 3, 54, 7);
insert into enrollment (id, student_id, section_id, grade_id) values
(34, 2, 43, 3);
insert into enrollment (id, student_id, section_id, grade_id) values

Answer

No, don't join everything and then try to clean up with DISTINCT. That's a bad approach. Instead think of what you want to select first. Then write the query step by step.

The "students who never took the course Databases" are

  • all students except those who took the course Databases
  • all students who are not in the set of students who took the course Databases
  • all students for whom not exists a Databases course enrolement

I have highlighted the keywords needed. So you have three options:

  • Write a query with EXCEPT
  • Write a query with NOT IN
  • Write a query with NOT EXISTS

Try these and come back here if you have further problems.

UPDATE: Now that you solved it (and even accepted my answer :-), here are some ways to write the query:

Query with IN clause:

select name 
from students 
where id not in 
(
  select student_id 
  from enrollment 
  where section_id in 
  (
    select id 
    from sections 
    where course_id = (select id from courses where title = 'Databases')
  )
); 

Query with EXISTS clause:

select name 
from students 
where not exists
(
  select * 
  from enrollment 
  where section_id in 
  (
    select id 
    from sections 
    where course_id = (select id from courses where title = 'Databases')
  )
  and student_id = students.id
); 

Query with EXCEPT (which is not so good a solution here, because it queries the students table twice, but sometimes EXCEPT is the straight-forward way to a problem). I am using a join on a subquery here instead of WHERE students.id IN (...), just in order to show the technique.

select name 
from students 
join
(
  select id
  from students
  except
  select student_id 
  from enrollment 
  where section_id in 
  (
    select id 
    from sections 
    where course_id = (select id from courses where title = 'Databases')
  )
) found_students on found_students.id = students.id; 

Query with COUNT and HAVING, which looks quite compact. It is however a bit more prone to errors. One thing is not to confuse ON and WHERE in outer joins, another is to count the correct column. We must make sure to count a non-nullable field of table sections, so we are sure none of the student's enrollments matched an actual Databases section.

select s.id, s.name
from students s
left join enrollment e on e.student_id = s.id
left join sections s on s.id = e.section_id 
                     and s.course_id = (select id from courses where title = 'Databases')
group by s.id, s.name
having count(s.id) = 0;