kinto-d - 4 months ago 25

Python Question

I have this kind of data :

`ID x1 x2 x3 x4 x5 x6 x7 x8 x9 x10`

1 -0.18 5 -0.40 -0.26 0.53 -0.66 0.10 2 -0.20 1

2 -0.58 5 -0.52 -1.66 0.65 -0.15 0.08 3 3.03 -2

3 -0.62 5 -0.09 -0.38 0.65 0.22 0.44 4 1.49 1

4 -0.22 -3 1.64 -1.38 0.08 0.42 1.24 5 -0.34 0

5 0.00 5 1.76 -1.16 0.78 0.46 0.32 5 -0.51 -2

what's the best method for visualizing this data, i'm using matplotlib to visualizing it, and read it from csv using pandas

thanks

Answer

Visualising data in a high-dimensional space is always a difficult problem. One solution that is commonly used (and is now available in `pandas`

) is to inspect all of the 1D and 2D projections of the data. It doesn't give you all of the information about the data, but that's impossible to visualise unless you can see in 10D! Here's an example of how to do this with pandas (version 0.7.3 upwards):

```
import numpy as np
import pandas as pd
from pandas.tools.plotting import scatter_matrix
#first make some fake data with same layout as yours
data = pd.DataFrame(np.random.randn(100, 10), columns=['x1', 'x2', 'x3',\
'x4','x5','x6','x7','x8','x9','x10'])
#now plot using pandas
scatter_matrix(data, alpha=0.2, figsize=(6, 6), diagonal='kde')
```

This generates a plot with all of the 2D projections as scatter plots, and KDE histograms of the 1D projections:

I also have a pure matplotlib approach to this on my github page, which produces a very similar type of plot (it is designed for MCMC output, but is also appropriate here). Here's how you'd use it here:

```
import corner_plot as cp
cp.corner_plot(data.as_matrix(),axis_labels=data.columns,nbin
s=10,figsize=(7,7),scatter=True,fontsize=10,tickfontsize=7)
```