I have a large data set and I want to do a convolution calculation using multiple rows that match a criteria. I need to calculate a vector for each row first, and I thought it would be more efficient to store my vector in a dataframe column so I could try and avoid a for loop when I do the convolution. Trouble is, the vectors are variable length and I can't figure out how to do it.
Here's a summary of my data:
Date State Alloc P
2012-01-01 AK 3 0.5
2012-01-01 AL 4 0.3
[P, np.zeros(Alloc), 1-P]
df['Test'] = [df['P'], np.zeros(df['Alloc'), 1 – df['P']]
So here's the answer. piRSquared was almost right, but not quite. There are several parts here.
The apply method partially works. It passes a row to the function and you can do a calculation as shown above. The problem is, you get a "ValueError: Shape of passed values is..." error message. The number of columns returned doesn't match the number of columns in the dataframe. My guess is this is because the return value is a list and Pandas isn't interpreting the result correctly.
The workaround is to do the apply on a single column. This single column should contain the P value and Alloc value. Here are the steps:
Create the merged column:
df['temp'] = df[['P','Alloc']].values.tolist()
Write a function:
def array_p(x): return [x] + *int(x) + [1 - x]
(int is needed because the previous line gives floats. I didn't need np.zeros)
Apply the function:
df['Array'] = temp['temp'].apply(array_p)
This works, but obviously involves more steps than it should. If anyone can provide a better answer, I'd love to hear it.