Peter Chen Peter Chen - 3 years ago 144
R Question

count the levels by groups

I have a question about check which level is the most.

Here is my data:

Year Area V1 V2 V3
2014 USA 100 25 50
2014 USA 200 50 60
2014 USA 200 50 50
2014 USA 200 50 50
2014 USA 300 75 40
2014 ASIA 100 25 60
2014 ASIA 100 25 70
2014 ASIA 300 75 60
2014 ASIA 400 100 60
2014 ASIA 500 125 70
2015 USA 100 25 80
2015 USA 300 75 80
2015 USA 300 75 70
2015 USA 300 75 90
2015 USA 500 125 40
2015 ASIA 400 100 90
2015 ASIA 400 100 80
2015 ASIA 300 75 80
2016 USA 500 125 60
2016 USA 500 125 60
2016 ASIA 100 25 50


What I want is:

Year Area V1 V2 V3 Count
2014 USA 200 50 50 5
2015 USA 300 75 80 5
2016 USA 500 125 60 2
2014 ASIA 100 25 60 5
2015 ASIA 400 100 80 3
2016 ASIA 100 25 50 1


In
V1
, there are 5 levels(100, 200, 300, 400, and 500).

In
V2
, there are also 5 levels, which are 0.25*
V1
.

In
V3
, there are 6 levels.

The result I want is group by
Year
and
Area
. Moreover,
V1
is the max count of levels. For example, in
Year == 2014
and
Area == USA
,
V1
contains 1 level 100, 3 level 200, and 1 level 300. So, the result should be 200 because it is the most one.
V2
and
V3
are the same.

Any idea?

DATA

dt <- fread("Year Area V1 V2 V3
2014 USA 100 25 50
2014 USA 200 50 60
2014 USA 200 50 50
2014 USA 200 50 50
2014 USA 300 75 40
2014 ASIA 100 25 60
2014 ASIA 100 25 70
2014 ASIA 300 75 60
2014 ASIA 400 100 60
2014 ASIA 500 125 70
2015 USA 100 25 80
2015 USA 300 75 80
2015 USA 300 75 70
2015 USA 300 75 90
2015 USA 500 125 40
2015 ASIA 400 100 90
2015 ASIA 400 100 80
2015 ASIA 300 75 80
2016 USA 500 125 60
2016 USA 500 125 60
2016 ASIA 100 25 50")

Answer Source

We can get the Mode function from here.

library(data.table)
Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
 }

Grouped by 'Year', 'Area' (using data.table syntax as it is already a data.table from reading with fread), loop through the Subset of data.table (.SD), get the element that has the highest frequency using Mode, similarly get the frequency of columns using table, find the sum, collapse the multiple columns to a single one by taking the maximum of each row (pmax) and concatenate (c) with the list of columns of 'Mode'

dt[, c(lapply(.SD, Mode), 
     Count = do.call(pmax, lapply(.SD, function(x) sum(table(x))))), by = .(Year, Area)]
#    Year Area  V1  V2 V3 Count
#1: 2014  USA 200  50 50     5
#2: 2014 ASIA 100  25 60     5
#3: 2015  USA 300  75 80     5
#4: 2015 ASIA 400 100 80     3
#5: 2016  USA 500 125 60     2
#6: 2016 ASIA 100  25 50     1
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download