#Create full time series and fill data
dfSorted = df.groupby(['AvailabilityZone', 'InstanceType'])
dfSorted = dfSorted.resample('H')
dfSorted = dfSorted.fillna("ffill")
dfSorted = dfSorted.dropna()
When you create groups, I am assuming you use
groupby. You can first create your groups:
groups = df.groupby(['whatever','grouping'])
Then you can get a list of lists for the value you want to correlate, I believe in your case this is
grouped_prices = [g['SpotPrice'].tolist() for i,g in groups]
numpy.corrcoef takes list of lists as input then calculates correlation between each list and returns you a correlation coefficient matrix . See: https://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html
This is your correlation coefficient matrix.
pandas.DataFrame.corr should give you the same results because by default they both calculate
Pearson correlation coefficient. I chose to use
numpy.corrcoef because I think it's easier in this case.
Also, before you use Pearson Correlation you should know that it only measures linear relations between variables and also there are certain assumptions that your data must meet in order to use it. See here for example.