Jakub Jakub - 2 months ago 9
Python Question

How can I update a list of lists very quickly in a thread-safe manner? - python

I am writing a script to add a "column" to a Python list of lists at 500 Hz. Here is the code that generates test data and passes it through a separate thread:

# fileA
import random, time, threading
data = [[] for _ in range(4)] # list with 4 empty lists (4 rows)
column = [random.random() for _ in data] # synthetic column of data
def synthesize_data():
while True:
for x,y in zip(data,column):
x.append(y)
time.sleep(0.002) # equivalent to 500 Hz
t1 = threading.Thread(target=synthesize_data).start()
# example of data
# [[0.61523098235, 0.61523098235, 0.61523098235, ... ],
# [0.15090349809, 0.15090349809, 0.15090349809, ... ],
# [0.92149878571, 0.92149878571, 0.92149878571, ... ],
# [0.41340918409, 0.41340918409, 0.41340918409, ... ]]

# fileB (in Jupyter Notebook)
[1] import fileA, copy

[2] # get a copy of the data at this instant.
data = copy.deepcopy(fileA.data)
for row in data:
print len(row)


If you run cell [2] in fileB, you should see that the lengths of the "rows" in
data
are not equal. Here is example output when I run the script:

8784
8786
8787
8787


I thought I might be grabbing the data in the middle of the
for
loop, but that would suggest that the lengths would be off by 1 at the most. The differences get more severe over time. My question: why is quickly adding columns to a list of lists unstable? Is it possible to make this process for stable?

You might suggest I use something like Pandas, but I want to use Python lists because of their speed advantage (the code needs to be as fast as possible). I tested the
for
loop,
map()
function, and Pandas data frame. Here is my test code (in Jupyter Notebook):

# Setup code
import pandas as pd
import random
channels = ['C3','C4','C5','C2']
a = [[] for _ in channels]
b = [random.random() for _ in a]
def add_col((x,y)):
x.append(y);
df = pd.DataFrame(index=channels)
b_pandas = pd.Series(b, index=df.index)

%timeit for x,y in zip(a,b): x.append(y) # 1000000 loops, best of 3: 1.32 µs per loop
%timeit map(add_col, zip(a,b)) # 1000000 loops, best of 3: 1.96 µs per loop
%timeit df[0] = b # 10000 loops, best of 3: 82.8 µs per loop
%timeit df[0] = b_pandas # 10000 loops, best of 3: 58.4 µs per loop


You might also suggest that I append the samples to
data
as rows and then transpose when it's time to analyze. I would rather not do that also in the interest of speed. This code will be used in a brain-computer interface, where analysis happens in a loop. Transposing would also have to happen in the loop, and this would get slow as the data grows.

edit: changed title from "Why is quickly adding columns to a list of lists unstable - python" to "How can I update a list of lists very quickly in a thread-safe manner? - python" to make the title more descriptive of the post and the answer.

Answer

The deepcopy() operation is copying lists as they are modified by another thread, and each copy operation takes a small amount of time (longer as the lists grow larger). So between copying the first of the 4 lists and copying the second, the other thread added 2 elements, indicating that copying a list of 8784 elements takes between 0.002 and 0.004 seconds.

That's because there is nothing preventing threading to switch between executing synthesize_data() and the deepcopy.copy() call. In other words, your code is simply not thread-safe.

You'd have to coordinate between your two threads; using a lock for example:

In fileA:

# ...
datalock = threading.RLock()
# ...

def synthesize_data():
    while True:
        with datalock:
            for x,y in zip(data,column):
                x.append(y)
            time.sleep(0.002)  # equivalent to 500 Hz

and in fileB:

with fileA.datalock:
    data = copy.deepcopy(fileA.data)
    for row in data:
        print len(row)

This ensures that copying only takes place when the thread in fileA is not trying to add more to the lists.

Using locking will slow down your operations; I suspect the pandas assignment operations are already subject to locks to keep them thread-safe.