user3393041 - 1 year ago 26

SQL Question

*****i have 4 million observations. i have a variable X which has 100 unique observations with 5 digit codes assigned to it(e.x. 10420, 49660, 52739,65231,etc.) and arranged in ascending order. in each of these unique X's, i have variable Y which has codes from 1-4. now i need to calculate weighted average by each codes of Y and also weighted average by each codes of Y across X.

So there are two stages: inner stage is to calculate weighted average by each codes of Y for each X's and outer stage is to calculate weighted average by each codes of Y for overall X**.***

`X wrkallmt medtt_all Y`

----- -------- --------- --

10420 15 25.4 1

10420 50 5.7 4

10420 70 5.9 3

10420 110 5.9 4

10420 110 15 2

12630 30 15.8 4

12630 15 10.9 2

12630 85 10.5 2

12630 20 20.7 1

12630 20 23.5 3

13560 20 20.3 3

13560 20 20.8 3

13560 30 20.9 4

13560 20 25.5 2

13560 25 30.6 4

13560 20 45.3 1

13560 20 40.4 4

13560 20 30.9 1

13560 10 10.8 2

inner stage: for each X's suppose

`x = 10420`

`(15*25.4)/15`

`((50*5.7)+(110*5.9))/(50+110)`

outer stage: weighted average for code 1 is (weithtedavg of code 1 in 10420 + weightedavg of code 1 in 12630 + weightedavg of code 1 in 13560)/(sum of wrkallmt in code 1 across x)

I can only think of using nested DO-LOOP. i figured out the inner stage but cannot construct the outer stage:

`data WORK.exp3;`

SET WORK.ABC;

do X = ??;

A = 0;

B = 0;

wgtavg_all = 0;

do Y = 1 to 4;

A = sum(wrkallmt*medtt_all);

B = sum(wrkallmt);

wgtavg_all = A/B;

output;

end;

output;

end;

run;

I'm using SAS 9.3.

Can i use proc sql using group by? but in that case i need to use group by two times to get the desired result.

Answer

There is certainly more than one way to do this, but see if this is close to what you want. This requires 2 sorts, but takes advantage of the BY statement (and the LAST temp variable) in the DATA STEP.

```
data have;
length x y 8.;
Input X wrkallmt medtt_all Y;
datalines;
10420 15 25.4 1
10420 50 5.7 4
10420 70 5.9 3
10420 110 5.9 4
10420 110 15 2
12630 30 15.8 4
12630 15 10.9 2
12630 85 10.5 2
12630 20 20.7 1
12630 20 23.5 3
13560 20 20.3 3
13560 20 20.8 3
13560 30 20.9 4
13560 20 25.5 2
13560 25 30.6 4
13560 20 45.3 1
13560 20 40.4 4
13560 20 30.9 1
13560 10 10.8 2
;
proc sort data=have out=sortedx;
by x y;
run;
Data need1(keep=x y weighted_avg_combined_y);
retain a b 0;
set sortedx;
by x y;
a = a + (wrkallmt * medtt_all);
b = b + wrkallmt;
if last.y then do;
weighted_avg_combined_y=a/b;
output ;
end;
run;
Proc sort data=have out=sortedy;
by y x;
run;
Data need2(keep=x y weighted_avg_all_y);
retain a b 0;
set sortedy;
by y x;
a = a + (wrkallmt * medtt_all);
b = b + wrkallmt;
if last.y then do;
weighted_avg_all_y=a/b;
output ;
end;
run;
```

Source (Stackoverflow)