Phani.lav - 1 year ago 81

Python Question

I need to read long file with timestamp in seconds, and plot of CDF using numpy or scipy. I did try with numpy but seems the output is NOT what it is supposed to be. The code below: Any suggestions appreciated.

`import numpy as np`

import matplotlib.pyplot as plt

data = np.loadtxt('Filename.txt')

sorted_data = np.sort(data)

cumulative = np.cumsum(sorted_data)

plt.plot(cumulative)

plt.show()

Answer Source

You have two options:

1: you can bin the data first. This can be done easily with the `numpy.histogram`

function:

import numpy as np import matplotlib.pyplot as plt data = np.loadtxt('Filename.txt') # Choose how many bins you want here num_bins = 20 # Use the histogram function to bin the data counts, bin_edges = np.histogram(data, bins=num_bins, normed=True) # Now find the cdf cdf = np.cumsum(counts) # And finally plot the cdf plt.plot(bin_edges[1:], cdf) plt.show()

2: rather than use `numpy.cumsum`

, just plot the `sorted_data`

array against the number of items smaller than each element in the array (see this answer for more details http://stackoverflow.com/a/11692365/588071):

import numpy as np import matplotlib.pyplot as plt data = np.loadtxt('Filename.txt') sorted_data = np.sort(data) yvals=np.arange(len(sorted_data))/float(len(sorted_data)-1) plt.plot(sorted_data,yvals) plt.show()