Tobias Wood Tobias Wood - 6 months ago 57
Python Question

Getting Pandas Dataframe in correct format for Seaborn swarmplot

I have data in a CSV file that I would like to plot in a Swarmplot like the 4th example here https://stanford.edu/~mwaskom/software/seaborn/tutorial/categorical.html - i.e. with colours denoting groups.

In the CSV file the data is arranged in rows like this:

Group,1,1,2,2
Value1,0.5,0.3,0.2,0.1
Value2,1.7,1.3,1.1,1.0
...


I want the colours on the swarmplot to be determined by the Group, and there to be a separate plot for each Value.

I have got this far:

import pandas as pns
import seaborn as sns
data_in = pns.read_cvs('file.csv',header=None,index_col=0)
data_t = data_in.transpose()


At this point my data frame looks like:

0 Group Value1 Value2 ...
1 1 0.5 1.7
2 1 0.3 1.3
3 2 0.2 1.1
4 2 0.1 1.0


If I then do

k = data_t.keys()[[2,3]]
sns.swarmplot(data_t[k])


I can pull out the columns I want and get a plot that looks good except for the colours. My problem is that in order to specify the colours using the hue keyword argument, I then have to specify either the 'x' or 'y' arguments as well. I can't figure out how to manipulate my Data Frame into a format where I can specify either of those variables. I think want to get something like this:

0 Group Name Value
1 1 Value1 0.5
2 2 Value1 0.3
3 1 Value2 0.2
4 2 Value2 0.1


...

But I can't work out if I want to stack(), use a pivot_table or something else entirely.

Thanks in advance.

Answer

In order to transform the table, you can use melt

pd.melt(df,id_vars='Group',value_vars=['Value1','Value2'])

Group variable  value
0      1   Value1    0.5
1      1   Value1    0.3
2      2   Value1    0.2
3      2   Value1    0.1
4      1   Value2    1.7
5      1   Value2    1.3
6      2   Value2    1.1
7      2   Value2    1.0
Comments