dedpo dedpo - 6 months ago 4x
SQL Question

Trying to DISTINCT count in pig latin, hadoop

DATE WindDirection
1/1/2000 SW
1/2/2000 SW
1/3/2000 SW
1/4/2000 NW
1/5/2000 NW

Question below

Every day is unqiue, and wind direction is not unique, SO now we are trying to get the COUNT of the most COMMON wind direction

My query was

weather_data = FOREACH Weather GENERATE $16 AS Date, $9 AS w_direction;

e = FOREACH weather_data {
unique_winds = DISTINCT weather_data.w_direction;
GENERATE unique_winds, COUNT(unique_winds);


dump e;

The logic is to find the DISTINCT WindDirections, there are like 7 AND then
group by WindDirection and apply count

Right now i think get the total number or count of directions of winds.


You will have to GROUP BY wind direction and get the counts.Order the counts by desc order and get the top most row.

wd = FOREACH Weather GENERATE $9 AS w_direction;
gwd = GROUP wd BY w_direction;
cwd = FOREACH gwd GENERATE group as wd,COUNT(wd.$0);
owd = ORDER cwd BY $1 DESC;
mwd  = LIMIT owd 1;
DUMP mwd;