John_Rodgers John_Rodgers - 6 months ago 32
R Question

Convert Polynominal to Binominal - Thousands of columns

I've a dataset with 100 columns (which name is Col_1,Col_2...Col_100) that have results like: "A","B","C"... I don't know ao many diferent characters have in all dataset. I'm trying to convert each value to a column to have a matrix like:

0 1 0 1
1 1 0 1

I'm trying with this:

train <- read.csv("train.csv",head=TRUE,sep=",")

recast(train, id ~ value, id.var = 1, fun.aggregate = function(x) (length(x) > 0) + 0L)

But I'm getting the following errors:

Error in eval(substitute(expr), envir, enclos) :
n must be a positive integer
In addition: Warning messages:
1: attributes are not identical across measure variables; they will be dropped
2: In split_indices(.group, .n) :
NAs introduced by coercion to integer range

What I can do to return the table that I want?

lmo lmo

Perhaps this is what you are looking for. The first step collects the possible values. The second step makes each variable aware of the potential values. This allows table to produce 0 counts when a particular value is missing so that rbind will construct the proper output.

# collect all possible values
allLevels <- levels(unlist(sapply(df, unique)))
# provide all levels to each variable in the data.frame
dfNew <- data.frame(lapply(df, function(i) factor(i, levels=allLevels)))

# produce the count for each variable, lapply(dfNew, table))
  a b c d e g i j
x 3 2 8 2 0 0 0 0
y 0 0 2 4 4 1 3 1


df <- data.frame(x=sample(letters[1:4], 15, replace=TRUE),
                 y=sample(letters[3:10], 15, replace=TRUE))