Konstantino Sparakis - 7 months ago 68

Python Question

I have a data frame, which is grouped by two columns "AvailbilityZone" and "InstanceType" as seen below.

I create this using the following code:

`#Create full time series and fill data`

dfSorted = df.groupby(['AvailabilityZone', 'InstanceType'])

dfSorted = dfSorted.resample('H')

dfSorted = dfSorted.fillna("ffill")

dfSorted = dfSorted.dropna()

Every grouping represents a time series. I have already resampled the data so that all the time series are running on an hourly basis. How do I run a correlation to find out how similar each time series is with one another?

When I use:

`dfSorted.corr()`

it just returns SpotPrice = 1 So I am going to assume I will probably have to use something like a loop? and compare each time series with the other? I'm lost any help is much appreciated!

Here Is my dataframe as a csv file:

https://www.dropbox.com/s/xgv8xm5n5o856mx/out.csv?dl=0

I simply used df.tocsv()

Answer

When you create groups, I am assuming you use `groupby`

. You can first create your groups:

```
groups = df.groupby(['whatever','grouping'])
```

Then you can get a list of lists for the value you want to correlate, I believe in your case this is `SpotPrice`

. So;

```
grouped_prices = [g['SpotPrice'].tolist() for i,g in groups]
```

`numpy.corrcoef`

takes list of lists as input then calculates correlation between each list and returns you a correlation coefficient matrix . See: https://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html

```
numpy.corrcoef(grouped_prices)
```

This is your correlation coefficient matrix.

`numpy.corrcoef`

and `pandas.DataFrame.corr`

should give you the same results because by default they both calculate `Pearson`

correlation coefficient. I chose to use `numpy.corrcoef`

because I think it's easier in this case.

Also, before you use Pearson Correlation you should know that it only measures linear relations between variables and also there are certain assumptions that your data must meet in order to use it. See here for example.

Source (Stackoverflow)