John_Rodgers John_Rodgers - 22 days ago 6
R Question

Convert Polynominal to Binominal - Thousands of columns

I've a dataset with 100 columns (which name is Col_1,Col_2...Col_100) that have results like: "A","B","C"... I don't know ao many diferent characters have in all dataset. I'm trying to convert each value to a column to have a matrix like:

A B C D
0 1 0 1
1 1 0 1


I'm trying with this:

library(reshape2)
train <- read.csv("train.csv",head=TRUE,sep=",")
train

recast(train, id ~ value, id.var = 1, fun.aggregate = function(x) (length(x) > 0) + 0L)


But I'm getting the following errors:

Error in eval(substitute(expr), envir, enclos) :
n must be a positive integer
In addition: Warning messages:
1: attributes are not identical across measure variables; they will be dropped
2: In split_indices(.group, .n) :
NAs introduced by coercion to integer range


What I can do to return the table that I want?

lmo lmo
Answer

Perhaps this is what you are looking for. The first step collects the possible values. The second step makes each variable aware of the potential values. This allows table to produce 0 counts when a particular value is missing so that rbind will construct the proper output.

# collect all possible values
allLevels <- levels(unlist(sapply(df, unique)))
# provide all levels to each variable in the data.frame
dfNew <- data.frame(lapply(df, function(i) factor(i, levels=allLevels)))

# produce the count for each variable
do.call(rbind, lapply(dfNew, table))
  a b c d e g i j
x 3 2 8 2 0 0 0 0
y 0 0 2 4 4 1 3 1

data

set.seed(1234)
df <- data.frame(x=sample(letters[1:4], 15, replace=TRUE),
                 y=sample(letters[3:10], 15, replace=TRUE))