user3393041 - 1 year ago 52

SQL Question

*****i have 4 million observations. i have a variable X which has 100 unique observations with 5 digit codes assigned to it(e.x. 10420, 49660, 52739,65231,etc.) and arranged in ascending order. in each of these unique X's, i have variable Y which has codes from 1-4. now i need to calculate weighted average by each codes of Y and also weighted average by each codes of Y across X.

So there are two stages: inner stage is to calculate weighted average by each codes of Y for each X's and outer stage is to calculate weighted average by each codes of Y for overall X**.***

`X wrkallmt medtt_all Y`

----- -------- --------- --

10420 15 25.4 1

10420 50 5.7 4

10420 70 5.9 3

10420 110 5.9 4

10420 110 15 2

12630 30 15.8 4

12630 15 10.9 2

12630 85 10.5 2

12630 20 20.7 1

12630 20 23.5 3

13560 20 20.3 3

13560 20 20.8 3

13560 30 20.9 4

13560 20 25.5 2

13560 25 30.6 4

13560 20 45.3 1

13560 20 40.4 4

13560 20 30.9 1

13560 10 10.8 2

inner stage: for each X's suppose

`x = 10420`

`(15*25.4)/15`

`((50*5.7)+(110*5.9))/(50+110)`

outer stage: weighted average for code 1 is (weithtedavg of code 1 in 10420 + weightedavg of code 1 in 12630 + weightedavg of code 1 in 13560)/(sum of wrkallmt in code 1 across x)

I can only think of using nested DO-LOOP. i figured out the inner stage but cannot construct the outer stage:

`data WORK.exp3;`

SET WORK.ABC;

do X = ??;

A = 0;

B = 0;

wgtavg_all = 0;

do Y = 1 to 4;

A = sum(wrkallmt*medtt_all);

B = sum(wrkallmt);

wgtavg_all = A/B;

output;

end;

output;

end;

run;

I'm using SAS 9.3.

Can i use proc sql using group by? but in that case i need to use group by two times to get the desired result.

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

There is certainly more than one way to do this, but see if this is close to what you want. This requires 2 sorts, but takes advantage of the BY statement (and the LAST temp variable) in the DATA STEP.

```
data have;
length x y 8.;
Input X wrkallmt medtt_all Y;
datalines;
10420 15 25.4 1
10420 50 5.7 4
10420 70 5.9 3
10420 110 5.9 4
10420 110 15 2
12630 30 15.8 4
12630 15 10.9 2
12630 85 10.5 2
12630 20 20.7 1
12630 20 23.5 3
13560 20 20.3 3
13560 20 20.8 3
13560 30 20.9 4
13560 20 25.5 2
13560 25 30.6 4
13560 20 45.3 1
13560 20 40.4 4
13560 20 30.9 1
13560 10 10.8 2
;
proc sort data=have out=sortedx;
by x y;
run;
Data need1(keep=x y weighted_avg_combined_y);
retain a b 0;
set sortedx;
by x y;
a = a + (wrkallmt * medtt_all);
b = b + wrkallmt;
if last.y then do;
weighted_avg_combined_y=a/b;
output ;
end;
run;
Proc sort data=have out=sortedy;
by y x;
run;
Data need2(keep=x y weighted_avg_all_y);
retain a b 0;
set sortedy;
by y x;
a = a + (wrkallmt * medtt_all);
b = b + wrkallmt;
if last.y then do;
weighted_avg_all_y=a/b;
output ;
end;
run;
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**