Christoph P. - 1 year ago 93
R Question

# Loop to create dummy variables

I'm currently working on a large data set (ca. 30k rows), And I'm working on creating a

`hedonic`
regression. The next step would be to create weekly dummy variables.

Now my data has assigned weekly number depending on the day the data was measured. There are 50 different weeks (1-52, 2 missing unaccounted). These weekly numbers are repeated until the change after about
`10 rows,`
however they are also recurring, as new product categories are measures. There are 132 are available in the dataset and one category contains between
`100 - 300 rows`
.

This is an example of the dataset

``````UPC         Weeks
1111112016  1
1111112016  1
1111112016  2
1111112016  2
1111112016  3
1111112016  3
1111112440  1
1111112440  1
1111112440  2
1111112440  2
1111112440  3
1111112440  3
``````

Now to create dummy variables, I created 50 columns, each having about 30k rows to represent the dataset. I would like to assign 1 to the row of the dummy week whenever dummy week (hence the column name) and real week (row of the orig. dataset) are equal.

Example Dummy (DW = Dummy Week):

``````DW1 DW2
NA  NA
NA  NA
NA  NA
``````

I tried the following:

``````for (i in 1:seq(Soap\$WEEK)){
if Soap\$WEEK[i] == seq(from=1, by=1, to=52){
for (j in names(x)){
x\$DW[[j]] = 1
else {
x\$DW[[j]] = 0
}}}}
``````

I know it is wrong, however I'm unable to resolve my problem. I would appreciate any help in this matter.

We can use `model.matrix()` from the `stats` package to dummify your data. First, we'll need to convert `Weeks` to a `factor` column.

``````df\$Weeks <- as.factor(df\$Weeks)
``````

Now we can run `model.matrix()`:

``````model.matrix(~ Weeks + UPC + 0, data = df)
#   Weeks1 Weeks2 Weeks3        UPC
#1       1      0      0 1111112016
#2       1      0      0 1111112016
#3       0      1      0 1111112016
#4       0      1      0 1111112016
#5       0      0      1 1111112016
#6       0      0      1 1111112016
#7       1      0      0 1111112440
#8       1      0      0 1111112440
#9       0      1      0 1111112440
#10      0      1      0 1111112440
#11      0      0      1 1111112440
#12      0      0      1 1111112440
``````

You can also just use `model.matrix(~ . + 0 , data = df)`, as numeric columns will be automatically passed over. The `+ 0` in the formula avoids replacing the first level by the `Intercept`. To see the difference try to run it without `0`.

Alternatively, you can also use `dummyVars` from the `caret` package. Here, no `Intercept` is the default behaviour:

``````library(caret)

dm <- dummyVars(" ~ .", data = df)
data.frame(predict(dm, newdata = df))
#          UPC Weeks.1 Weeks.2 Weeks.3
#1  1111112016       1       0       0
#2  1111112016       1       0       0
#3  1111112016       0       1       0
#4  1111112016       0       1       0
#5  1111112016       0       0       1
#6  1111112016       0       0       1
#7  1111112440       1       0       0
#8  1111112440       1       0       0
#9  1111112440       0       1       0
#10 1111112440       0       1       0
#11 1111112440       0       0       1
#12 1111112440       0       0       1
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download