Peter Chen - 3 years ago 144
R Question

# count the levels by groups

I have a question about check which level is the most.

Here is my data:

``````Year  Area    V1    V2    V3
2014  USA    100    25    50
2014  USA    200    50    60
2014  USA    200    50    50
2014  USA    200    50    50
2014  USA    300    75    40
2014  ASIA   100    25    60
2014  ASIA   100    25    70
2014  ASIA   300    75    60
2014  ASIA   400    100   60
2014  ASIA   500    125   70
2015  USA    100    25    80
2015  USA    300    75    80
2015  USA    300    75    70
2015  USA    300    75    90
2015  USA    500    125   40
2015  ASIA   400    100   90
2015  ASIA   400    100   80
2015  ASIA   300    75    80
2016  USA    500    125   60
2016  USA    500    125   60
2016  ASIA   100    25    50
``````

What I want is:

``````Year  Area   V1    V2    V3   Count
2014  USA    200   50    50   5
2015  USA    300   75    80   5
2016  USA    500   125   60   2
2014  ASIA   100   25    60   5
2015  ASIA   400   100   80   3
2016  ASIA   100   25    50   1
``````

In
`V1`
, there are 5 levels(100, 200, 300, 400, and 500).

In
`V2`
, there are also 5 levels, which are 0.25*
`V1`
.

In
`V3`
, there are 6 levels.

The result I want is group by
`Year`
and
`Area`
. Moreover,
`V1`
is the max count of levels. For example, in
`Year == 2014`
and
`Area == USA`
,
`V1`
contains 1 level 100, 3 level 200, and 1 level 300. So, the result should be 200 because it is the most one.
`V2`
and
`V3`
are the same.

Any idea?

DATA

``````dt <- fread("Year  Area    V1    V2    V3
2014  USA    100    25    50
2014  USA    200    50    60
2014  USA    200    50    50
2014  USA    200    50    50
2014  USA    300    75    40
2014  ASIA   100    25    60
2014  ASIA   100    25    70
2014  ASIA   300    75    60
2014  ASIA   400    100   60
2014  ASIA   500    125   70
2015  USA    100    25    80
2015  USA    300    75    80
2015  USA    300    75    70
2015  USA    300    75    90
2015  USA    500    125   40
2015  ASIA   400    100   90
2015  ASIA   400    100   80
2015  ASIA   300    75    80
2016  USA    500    125   60
2016  USA    500    125   60
2016  ASIA   100    25    50")
``````

We can get the `Mode` function from here.

``````library(data.table)
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
``````

Grouped by 'Year', 'Area' (using `data.table` syntax as it is already a `data.table` from reading with `fread`), loop through the Subset of data.table (`.SD`), get the element that has the highest frequency using `Mode`, similarly get the frequency of columns using `table`, find the `sum`, collapse the multiple columns to a single one by taking the maximum of each row (`pmax`) and concatenate (`c`) with the `list` of columns of 'Mode'

``````dt[, c(lapply(.SD, Mode),
Count = do.call(pmax, lapply(.SD, function(x) sum(table(x))))), by = .(Year, Area)]
#    Year Area  V1  V2 V3 Count
#1: 2014  USA 200  50 50     5
#2: 2014 ASIA 100  25 60     5
#3: 2015  USA 300  75 80     5
#4: 2015 ASIA 400 100 80     3
#5: 2016  USA 500 125 60     2
#6: 2016 ASIA 100  25 50     1
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download