Mike Woodward - 1 year ago 118

Python Question

I have a large data set and I want to do a convolution calculation using multiple rows that match a criteria. I need to calculate a vector for each row first, and I thought it would be more efficient to store my vector in a dataframe column so I could try and avoid a for loop when I do the convolution. Trouble is, the vectors are variable length and I can't figure out how to do it.

Here's a summary of my data:

`Date State Alloc P`

2012-01-01 AK 3 0.5

2012-01-01 AL 4 0.3

…

Each state has a different Alloc and P value. There’s a row for every date and state and my dataframe is over 15,000 rows long.

For each entry, I want a vector that looks like this:

`[P, np.zeros(Alloc), 1-P]`

I can't figure out how to set a new column like this. I've tried statements like:

`df['Test'] = [df['P'], np.zeros(df['Alloc'), 1 – df['P']]`

but they don't work.

Does anyone have any ideas?

Thanks ☺

Answer Source

So here's the answer. piRSquared was almost right, but not quite. There are several parts here.

The apply method partially works. It passes a row to the function and you can do a calculation as shown above. The problem is, you get a "ValueError: Shape of passed values is..." error message. The number of columns returned doesn't match the number of columns in the dataframe. My guess is this is because the return value is a list and Pandas isn't interpreting the result correctly.

The workaround is to do the apply on a single column. This single column should contain the P value and Alloc value. Here are the steps:

Create the merged column:

```
df['temp'] = df[['P','Alloc']].values.tolist()
```

Write a function:

```
def array_p(x): return [x[0]] + [0]*int(x[1]) + [1 - x[0]]
```

(int is needed because the previous line gives floats. I didn't need np.zeros)

Apply the function:

```
df['Array'] = temp['temp'].apply(array_p)
```

This works, but obviously involves more steps than it should. If anyone can provide a better answer, I'd love to hear it.