velvetrock - 6 months ago 30

SQL Question

I have a table which contains the information of course selected in 2 semesters of each student. Those students didn't validate their first semester, so all the

`valid_or_not_of_semester='N'`

`semester='1st'`

`student semester course_selected valid_or_not_of_semester`

A 1st math N

A 1st english N

A 2nd math Y

A 2nd english Y

B 1st math N

B 2nd math Y

B 2nd english Y

C 1st math N

C 2nd math N

For those students who selected

`math`

`english`

`math`

`english`

`--------------------------------------------------------------------------`

1st semester \ 2nd semester | Math | English

invalid \ |---------------------|--------------------

students \ | valid | invalid | valid | invalid

--------------------------------------------------------------------------

Math | 2 | 1 | 2 | 0

--------------------------------------------------------------------------

English | 1 | 0 | 1 | 0

--------------------------------------------------------------------------

Each row stands for the number of students who didn't validate the first semester and have selected the course in the first semester. And columns seprate students who selected the course into valid and invalid of their second semester. To be more precise,

`--------------------------------------------------------------------------`

1st semester \ 2nd semester | Math | English

invalid \ |---------------------|--------------------

students \ | valid | invalid | valid | invalid

--------------------------------------------------------------------------

Math | 2 | 1 | 2 | 0

| | |

\ / \ / \ /

(students A&B) (student C) (students A&B)

I tried proc sql:

`data math;`

merge have

have (where=(semester='1st') in=these);

by student;

if these then output;

run;

proc sql;

create table result as

select count(distinct student) as nb_student

from math (where=(semester='2nd'))

group by course_selected, valid_or_not_of_semester;

quit;

And do the same thing for

`english`

But is there any way to obtain the result of 2 courses directly? How could I use the proc freq?

Hope to get your answer. Thanks in advance!

Answer

This doesn't give you exactly the table you were looking for, but it does generate the values you were interested in. The idea is to transpose the original dataset and then count up observations afterwards.

You may also want to look into proc tabulate though you may run into problems because you're double-counting students under certain circumstances.

```
data temp;
input student $ semester $ course_selected $ valid_or_not_of_semester $;
datalines;
A 1st math N
A 1st english N
A 2nd math Y
A 2nd english Y
B 1st math N
B 2nd math Y
B 2nd english Y
C 1st math N
C 2nd math N
;
proc sort; by student;
run;
proc transpose data = temp out = temp2;
by student;
id course_selected semester;
var valid_or_not_of_semester;
run;
proc sql;
create table temp3 as select distinct
sum(case when math1st = "N" and math2nd = "Y" then 1 else 0 end) as math_math_valid,
sum(case when math1st = "N" and math2nd = "N" then 1 else 0 end) as math_math_invalid,
sum(case when english1st = "N" and math2nd = "Y" then 1 else 0 end) as english_math_valid,
sum(case when english1st = "N" and math2nd = "N" then 1 else 0 end) as english_math_invalid,
sum(case when math1st = "N" and english2nd = "Y" then 1 else 0 end) as math_english_valid,
sum(case when math1st = "N" and english2nd = "N" then 1 else 0 end) as math_english_invalid,
sum(case when english1st = "N" and english2nd = "Y" then 1 else 0 end) as english_english_valid,
sum(case when english1st = "N" and english2nd = "N" then 1 else 0 end) as english_english_invalid
from temp2;
quit;
```