Mamba Mamba - 1 month ago 6
R Question

How to find most frequent levels of factors?

I still learning R and apologizing for lack of knowledge.

My data has 192 countries and looks similar to that:

# Given some data which resemble the original data
cars_produced <- data.frame(countries = c("US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"France",
"France",
"France",
"France",
"France",
"France",
"France",
"France",
"Norway",
"Norway",
"Norway",
"Norway",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany"
),
manufacturer = c( "Mercedes",
"Mercedes",
"Volkswagen",
"Volkswagen",
"Volkswagen",
"BMW",
"General motors",
"General motors",
"General motors",
"General motors",
"General motors",
"Ford",
"Ford",
"Ford",
"Toyota",
"Toyota",
"Toyota",
"Mercedes",
"Mercedes",
"Mercedes",
"Mercedes",
"BMW",
"BMW",
"BMW",
"Toyota",
"Volkswagen",
"Volkswagen",
"Volkswagen",
"Volkswagen",
"Volkswagen",
"BMW",
"BMW",
"BMW",
"BMW",
"Volkswagen",
"Volkswagen",
"Volkswagen",
"Volkswagen",
"Mercedes",
"Mercedes",
"Mercedes",
"Mercedes"

),

model=c("GLK",
"M",
"Passat",
"Golf",
"Caddy",
"M4",
"Hammer",
"Pontiac",
"Chevrolet",
"Corvette",
"Cadillac",
"KA",
"Fiesta",
"Taurus",
"Yaris",
"Carina",
"Briska",
"GLK",
"M",
"GL",
"C",
"M4",
"X5",
"i8",
"Carina",
"Passat",
"Golf",
"Caddy",
"Sharan",
"Polo",
"M4",
"X5",
"i8",
"E9",
"Passat",
"Golf",
"Caddy",
"Sharan",
"GLK",
"M",
"GL",
"C")
)




> cars_produced
countries manufacturer model
#1 US Mercedes GLK
#2 US Mercedes M
#3 US Volkswagen Passat
#4 US Volkswagen Golf
#5 US Volkswagen Caddy
#6 US BMW M4
#7 US General motors Hammer
#8 US General motors Pontiac
#9 US General motors Chevrolet
#10 US General motors Corvette
#11 US General motors Cadillac
#12 US Ford KA
#13 US Ford Fiesta
#14 US Ford Taurus
#15 US Toyota Yaris
#16 France Toyota Carina
#17 France Toyota Briska
#18 France Mercedes GLK
#19 France Mercedes M
#20 France Mercedes GL
#21 France Mercedes C
#22 France BMW M4
#23 France BMW X5
#24 Norway BMW i8
#25 Norway Toyota Carina
#26 Norway Volkswagen Passat
#27 Norway Volkswagen Golf
#28 Germany Volkswagen Caddy
#29 Germany Volkswagen Sharan
#30 Germany Volkswagen Polo
#31 Germany BMW M4
#32 Germany BMW X5
#33 Germany BMW i8
#34 Germany BMW E9
#35 Germany Volkswagen Passat
#36 Germany Volkswagen Golf
#37 Germany Volkswagen Caddy
#38 Germany Volkswagen Sharan
#39 Germany Mercedes GLK
#40 Germany Mercedes M
#41 Germany Mercedes GL
#42 Germany Mercedes C


My questions are:


  1. How many car models are generally produced by countries (from which manufacturers)?


    1. How can I select the most and least popular car models worldwdide (with their corresponding manufacturers)?




In that regard I have tried to use the

library(dplyr)


For question one I have tried the following:

count_by_manufacturer<- cars_produced[,-1] %>% group_by(manufacturer) %>% summarise(count = n())


Most pupular. However I dont know hwo to get the correpsonding manufacturer:

Countries_by_models<- cars_produced[,-2] %>% group_by(model) %>% summarise(count = n())

Answer

You can generate your desired results using dplyr.

For the first and second results, we don't need to deselect the column that is not being grouped. Instead, to find the count of models manufactured by countries, group_by the countries and summarise:

library(dplyr)
cars_produced %>% group_by(countries) %>% summarise(count=n())
### A tibble: 4 x 2
##  countries count
##     <fctr> <int>
##1    France     8
##2   Germany    15
##3    Norway     4
##4        US    15

To find the count of models by manufacturer, group_by the manufacturer:

cars_produced %>% group_by(manufacturer) %>% summarise(count=n())
### A tibble: 6 x 2
##    manufacturer count
##          <fctr> <int>
##1            BMW     8
##2           Ford     3
##3 General motors     5
##4       Mercedes    10
##5         Toyota     4
##6     Volkswagen    12

To find the most popular model (and its manufacturer), first group_by the model and create a column containing the count by model. Then, ungroup and filter to keep only those rows with the max(count). Finally, group_by both manufacturer and model and summarise the count:

cars_produced %>% group_by(model) %>% mutate(count=n()) %>% 
                  ungroup %>% filter(count==max(count)) %>% 
                  group_by(manufacturer, model) %>% summarise(count=first(count))
##Source: local data frame [6 x 3]
##Groups: manufacturer [?]
##
##  manufacturer  model count
##        <fctr> <fctr> <int>
##1          BMW     M4     3
##2     Mercedes    GLK     3
##3     Mercedes      M     3
##4   Volkswagen  Caddy     3
##5   Volkswagen   Golf     3
##6   Volkswagen Passat     3

To find the least popular model, do the same except filter to keep only rows with the min(count):

cars_produced %>% group_by(model) %>% mutate(count=n()) %>% 
                  ungroup %>% filter(count==min(count)) %>% 
                  group_by(manufacturer, model) %>% summarise(count=first(count))
##Source: local data frame [12 x 3]
##Groups: manufacturer [?]
##
##     manufacturer     model count
##           <fctr>    <fctr> <int>
##1             BMW        E9     1
##2            Ford    Fiesta     1
##3            Ford        KA     1
##4            Ford    Taurus     1
##5  General motors  Cadillac     1
##6  General motors Chevrolet     1
##7  General motors  Corvette     1
##8  General motors    Hammer     1
##9  General motors   Pontiac     1
##10         Toyota    Briska     1
##11         Toyota     Yaris     1
##12     Volkswagen      Polo     1