Hangon Hangon - 5 months ago 13
SQL Question

How to read and plot arrays from pandas df fast

I have following data frame in pandas that contains array reading it directly from a sqlite db using pd.read_sql():

ArrayID Value
0 0 0
1 0 1
2 0 2
3 0 3
4 0 4
5 0 5
6 1 0
7 1 1
8 1 2
9 1 3

I would like to know a fast way to get the arrays so I can plot it:

Array0 [0,1,2,3,4,5]

Array1 [0,1,2,3]

The only way I could think was (really slow when the table has 1000 arrays with arrays varying on length having maxixum length of 500):

import pandas as pd
import matplotlib.pyplot as plt

# loop on
for id in df.ArrayID:
array = df.loc[df["ArrayID"]==id, "Value"].values()


Or is the matplotlib beeing the issue?


Use groupby to obtain the groups in one call, (instead of many calls to df.loc and df['ArrayID'] == id):

for aid, grp in df.groupby(['ArrayID']):

Note also that plt.plot is not very fast. Calling it 1000 times may feel pretty slow. Moreover, a plot with 1000 lines may not look very comprehensible. You may need to rethink what quantity (perhaps through clustering or aggregation) that you wish to visualize.