ftatarli ftatarli - 1 month ago 20
R Question

The logic behind subset in R

I'm trying to get a subset of a data.frame but I don't understand the logic behind the result. I'm used to work with SQL and I thought that I could isolated one value of the matrix to became easier to work.

// created a dataset example, with any values and then I combined
Country <- c('Argentina','Brazil','Chile')
Quantity <- c(1, seq(5))
M <- cbind(Country,Quantity)
M <- as.data.frame(M)

// the result
Country Quantity
1 Argentina 1
2 Brazil 1
3 Chile 2
4 Argentina 3
5 Brazil 4
6 Chile 5

// now I tried to isolated
test <- M[M$Country=="Brazil",]

// and still good
Country Quantity
2 Brazil 1
5 Brazil 4


When I used the command "table", that for me is the closed to count(*), the count result is OK, but it brings all the Countries and I don't understand this result, because I filtered only Brazil latter.

table(test)

Quantity
Country 1 2 3 4 5
Argentina 0 0 0 0 0
Brazil 1 0 0 1 0
Chile 0 0 0 0 0


Thanks,

Filipe

Answer

You have factors. The levels are kept even after the subset. You can avoid the whole thing ahead of time with stringsAsFactors = FALSE

M <- data.frame(Country, Quantity, stringsAsFactors=FALSE)
test <- M[M$Country=="Brazil",]   
table(test)
#         Quantity
# Country  1 4
#   Brazil 1 1

If you need factors you can also use droplevels to remove the unnecessary levels:

test <- M[M$Country=="Brazil",]   
test2 <- droplevels(test)
table(test2)
#         Quantity
# Country  1 4
#   Brazil 1 1