Sitz Blogz Sitz Blogz - 7 months ago 48
Python Question

Seasborn Distplot goes unresponsive

I am trying to plot a simple

Distplot
using
pandas
and
seaborn
to understand the density of the datasets.

Input

#Car,45
#photo,4
#movie,6
#life,1
#Horse,14
#Pets,20
#run,67
#picture,89


The dataset has above
10K
rows,
no headers
and I am trying to use
col[1]


code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


df = pd.read_csv('keyword.csv', delimiter=',', header=None, usecols=[1])
#print df
sns.distplot(df)

plt.show()


No error as I can print the input column but the
distplot
is taking ages to compute and freezes my screen. Any suggestion to speed the process.

Edit1: As Suggested in the Comment Below I try to change from
pandas.read_csv
to
np.loadtxt
and now I get an error.

Code:

import numpy as np
from numpy import log as log
import matplotlib.pyplot as plt
import seaborn as sns
import pandas

df = np.loadtxt('keyword.csv', delimiter=',', usecols=(1), unpack=True)
sns.kdeplot(df)
sns.distplot(df)

plt.show()


Error:

Traceback (most recent call last):
File "0_distplot_csv.py", line 7, in <module>
df = np.loadtxt('keyword.csv', delimiter=',', usecols=(1), unpack=True)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 726, in loadtxt
usecols = list(usecols)
TypeError: 'int' object is not iterable


Edit 2: I did try the mentioned suggestions from the comment section

sns.distplot(df[1])


This does the same as mentioned initially. The screen is frozen for ages.

sns.distplot(df[1].values)


I see a strange behavior in this case.

When the input is

Car,45
photo,4
movie,6
life,1
Horse,14
Pets,20
run,67
picture,89


It does plot but when the input is below

#Car,45
#photo,4
#movie,6
#life,1
#Horse,14
#Pets,20
#run,67
#picture,89


It is again the same freezing entire screen and would do nothing.

I did try to put
comments=None
thinking it might be reading them as comments. But looks like
comments
isn't used in
pandas
.

Thank you

Answer

After several trials and a lot of online search, I could finally get what I was looking for. The code allows to load data with column number when we do not have headers. This also reads the rows with # comments.

code:

import numpy as np
import matplotlib.pyplot as plt
from pylab import*
import math
from matplotlib.ticker import LogLocator
from scipy.stats.kde import gaussian_kde
import seaborn as sns

data = np.genfromtxt('keyword.csv', delimiter=',', comments=None)

d0=data[:,1]

#Plot a simple histogram with binsize determined automatically
sns.kdeplot(np.array(d0), color='b', bw=0.5, marker='o', label='keyword')

plt.legend(loc='upper right')
plt.xlabel('Freq(x)')
plt.ylabel('pdf(x)')
#plt.gca().set_xscale("log")
#plt.gca().set_yscale("log")
plt.show()
Comments