I am writing a script to add a "column" to a Python list of lists at 500 Hz. Here is the code that generates test data and passes it through a separate thread:
import random, time, threading
data = [ for _ in range(4)] # list with 4 empty lists (4 rows)
column = [random.random() for _ in data] # synthetic column of data
for x,y in zip(data,column):
time.sleep(0.002) # equivalent to 500 Hz
t1 = threading.Thread(target=synthesize_data).start()
# example of data
# [[0.61523098235, 0.61523098235, 0.61523098235, ... ],
# [0.15090349809, 0.15090349809, 0.15090349809, ... ],
# [0.92149878571, 0.92149878571, 0.92149878571, ... ],
# [0.41340918409, 0.41340918409, 0.41340918409, ... ]]
# fileB (in Jupyter Notebook)
 import fileA, copy
 # get a copy of the data at this instant.
data = copy.deepcopy(fileA.data)
for row in data:
# Setup code
import pandas as pd
channels = ['C3','C4','C5','C2']
a = [ for _ in channels]
b = [random.random() for _ in a]
df = pd.DataFrame(index=channels)
b_pandas = pd.Series(b, index=df.index)
%timeit for x,y in zip(a,b): x.append(y) # 1000000 loops, best of 3: 1.32 µs per loop
%timeit map(add_col, zip(a,b)) # 1000000 loops, best of 3: 1.96 µs per loop
%timeit df = b # 10000 loops, best of 3: 82.8 µs per loop
%timeit df = b_pandas # 10000 loops, best of 3: 58.4 µs per loop
deepcopy() operation is copying lists as they are modified by another thread, and each copy operation takes a small amount of time (longer as the lists grow larger). So between copying the first of the 4 lists and copying the second, the other thread added 2 elements, indicating that copying a list of 8784 elements takes between 0.002 and 0.004 seconds.
That's because there is nothing preventing threading to switch between executing
synthesize_data() and the
deepcopy.copy() call. In other words, your code is simply not thread-safe.
You'd have to coordinate between your two threads; using a lock for example:
# ... datalock = threading.RLock() # ... def synthesize_data(): while True: with datalock: for x,y in zip(data,column): x.append(y) time.sleep(0.002) # equivalent to 500 Hz
with fileA.datalock: data = copy.deepcopy(fileA.data) for row in data: print len(row)
This ensures that copying only takes place when the thread in
fileA is not trying to add more to the lists.
Using locking will slow down your operations; I suspect the pandas assignment operations are already subject to locks to keep them thread-safe.