Doug Fir - 2 months ago 11

R Question

I have a dataset where the target variable is skewed left. When I plot a histogram of the log of this variable it's a nice, normal looking distribution. So I believe I should log transform it?

I tried that in my_model below. But when I evaluated it by looking at Mean Absolute Error I found that it under performed against the non log transformed version.

`my_model <- lm(target ~ ,var1+var2+var3, data=ptrain)`

my_model_log <- lm(log(target) ~ ,var1+var2+var3, data=ptrain)

my_predictions <- predict(my_model_log, interval="prediction", newdata=test_submission)

my_predictions showed lower performance when using the log model.

Is this expected? Is there a parameter I should add to

`predict()`

Answer

If you `predict()`

, it will return an estimate for log (target). If you want an estimate for target you need to apply the inverse transformation, `exp()`

, to the predictions. The prediction interval may have interesting properties.