I've got data I'm reading in as a dataframe from a CSV using Pandas (in Python). The CSV looks basically like the following:
date Thursday, May 5
subject 'Unique subject line 1'
date Tuesday, May 17
subject 'Unique subject line 2'
date Monday, May 9
subject 'Unique subject line 3'
image date link subject
img1.jpg Thursday, May 5 bit.ly/asdf 'Unique subject line 1'
img2.jpg Tuesday, May 17 bit.ly/zxcv 'Unique subject line 2'
img3.jpg Monday, May 9 bit.ly/sdfg 'Unique subject line 3'
The issue is that, as the data currently is formatted, there isn't a unique way to group the images during a pivot. Any date could be grouped with
img1.jpg during a pivot, as there isn't any additional data saying which date should correspond to each image.
To fix this, we just need to add an additional column with the grouping information. Judging by your output, the grouping essentially go in row order; the first 4 rows go together, the next 4 rows go together, etc. To enumerate repeats like this,
numpy.repeat is useful, you just need to know the number of images and attributes. Some basic math allows us to get the number of images and number of attributes in general:
# Add an grouping column. nbr_images = (df['col1'] == 'image').sum() nbr_attributes = len(df)/nbr_images df['image_group'] = np.repeat(range(nbr_images), nbr_attributes)
Now, it's straightforward to pivot:
# Pivot the DataFrame. pivoted_df = df.pivot(columns='col1', index='image_group', values='col2') # Clear the index and column name. pivoted_df.index.name = None pivoted_df.columns.name = None
The resulting output:
date image link subject 0 Thursday, May 5 img1.jpg bit.ly/asdf Unique subject line 1 1 Tuesday, May 17 img2.jpg bit.ly/zxcv Unique subject line 2 2 Monday, May 9 img3.jpg bit.ly/sdfg Unique subject line 3