I have an autocorrelation problem in my panel data.
So I decided to use first difference method so deal with this problem.
Most of my independent variables are binary.
So if I do the finite difference method over this,
I get -1, 0, and 1 instead of 0 or 1 as before.
Is this ok?
Besides, my data set time flow is as follows which I am not sure how I can apply first difference method in this case when I have multiple difference incidents happening on the same day:
Date X Y Z L M A B C D E
01/01/2017 0 1 0 0 0 0 1 0 0 7.8
01/01/2017 0 1 0 0 0 1 0 0 1 6.5
01/01/2017 0 0 0 0 1 1 0 0 1 6.5
01/03/2017 0 1 0 0 0 0 0 0 0 7.8
01/04/2017 0 0 1 0 0 1 0 0 0 6.5
01/04/2017 0 0 0 0 0 0 1 0 0 7.3
Differencing is used when the variable that you would like to forecast (i.e. the dependent variable) is not stationary. When you forecast you should transform your dependent variable to a stationary one if it is not stationary. If the variable is stationary then you wouldn't need to use differencing. If you use differencing for the dependent variable then you need to use differencing for all your independent ones (including dummy variables).
In order to deal with autocorrelation you could use the
auto.arima function from the
forecast package. This will try a number of different models and will choose the best depending on the AIC (will also use differencing if needed). You need to know, though, that autocorrelation cannot always be eliminated. There will be cases where it will always exist because of various reasons (e.g. the variables you have do not explain all the variance). This does not mean, however, that you cannot use a model with autocorrelation for forecasting.
I suggest you read the following book about forecasting or the chapter talking about
arima which explains in detail how to deal with autocorrelation and when
differencing is needed.