Jakub Jakub - 1 year ago 54
Python Question

How can I update a list of lists very quickly in a thread-safe manner? - python

I am writing a script to add a "column" to a Python list of lists at 500 Hz. Here is the code that generates test data and passes it through a separate thread:

# fileA
import random, time, threading
data = [[] for _ in range(4)] # list with 4 empty lists (4 rows)
column = [random.random() for _ in data] # synthetic column of data
def synthesize_data():
while True:
for x,y in zip(data,column):
x.append(y)
time.sleep(0.002) # equivalent to 500 Hz
t1 = threading.Thread(target=synthesize_data).start()
# example of data
# [[0.61523098235, 0.61523098235, 0.61523098235, ... ],
# [0.15090349809, 0.15090349809, 0.15090349809, ... ],
# [0.92149878571, 0.92149878571, 0.92149878571, ... ],
# [0.41340918409, 0.41340918409, 0.41340918409, ... ]]

# fileB (in Jupyter Notebook)
[1] import fileA, copy

[2] # get a copy of the data at this instant.
data = copy.deepcopy(fileA.data)
for row in data:
print len(row)


If you run cell [2] in fileB, you should see that the lengths of the "rows" in
data
are not equal. Here is example output when I run the script:

8784
8786
8787
8787


I thought I might be grabbing the data in the middle of the
for
loop, but that would suggest that the lengths would be off by 1 at the most. The differences get more severe over time. My question: why is quickly adding columns to a list of lists unstable? Is it possible to make this process for stable?

You might suggest I use something like Pandas, but I want to use Python lists because of their speed advantage (the code needs to be as fast as possible). I tested the
for
loop,
map()
function, and Pandas data frame. Here is my test code (in Jupyter Notebook):

# Setup code
import pandas as pd
import random
channels = ['C3','C4','C5','C2']
a = [[] for _ in channels]
b = [random.random() for _ in a]
def add_col((x,y)):
x.append(y);
df = pd.DataFrame(index=channels)
b_pandas = pd.Series(b, index=df.index)

%timeit for x,y in zip(a,b): x.append(y) # 1000000 loops, best of 3: 1.32 µs per loop
%timeit map(add_col, zip(a,b)) # 1000000 loops, best of 3: 1.96 µs per loop
%timeit df[0] = b # 10000 loops, best of 3: 82.8 µs per loop
%timeit df[0] = b_pandas # 10000 loops, best of 3: 58.4 µs per loop


You might also suggest that I append the samples to
data
as rows and then transpose when it's time to analyze. I would rather not do that also in the interest of speed. This code will be used in a brain-computer interface, where analysis happens in a loop. Transposing would also have to happen in the loop, and this would get slow as the data grows.

edit: changed title from "Why is quickly adding columns to a list of lists unstable - python" to "How can I update a list of lists very quickly in a thread-safe manner? - python" to make the title more descriptive of the post and the answer.

Answer Source

The deepcopy() operation is copying lists as they are modified by another thread, and each copy operation takes a small amount of time (longer as the lists grow larger). So between copying the first of the 4 lists and copying the second, the other thread added 2 elements, indicating that copying a list of 8784 elements takes between 0.002 and 0.004 seconds.

That's because there is nothing preventing threading to switch between executing synthesize_data() and the deepcopy.copy() call. In other words, your code is simply not thread-safe.

You'd have to coordinate between your two threads; using a lock for example:

In fileA:

# ...
datalock = threading.RLock()
# ...

def synthesize_data():
    while True:
        with datalock:
            for x,y in zip(data,column):
                x.append(y)
            time.sleep(0.002)  # equivalent to 500 Hz

and in fileB:

with fileA.datalock:
    data = copy.deepcopy(fileA.data)
    for row in data:
        print len(row)

This ensures that copying only takes place when the thread in fileA is not trying to add more to the lists.

Using locking will slow down your operations; I suspect the pandas assignment operations are already subject to locks to keep them thread-safe.