sheh sheh - 2 months ago 16x
Linux Question

Handle out of range in select

Many (>1000) workers (process) do some and want to save their work results in database. Result of work is JSON object. Workers produce JSON objects 1-5 per second. Database saver is separated process. Unidirectional connections for transfer JSON object from worker to saver is multiprocessing.Pipe. Number of pipes equal number of workers.

In saver process periodically сall:

def recv_data(self):
data = []
for pipe in self.data_pipe_pool:
if pipe.poll():
return data

self.data_pipe_pool - list of pipes from workers.

All work fine if i run ~100 workers. If i run >1000 workers i get exception:

Traceback (most recent call last):
File "", line 44, in run
profile = self.poll_data()
File "", line 116, in poll_data
ret = self.recv_data()
File "", line 127, in recv_data
if pipe.poll():
IOError: handle out of range in select()

I know that this is due with
call and that:

FD_SETSIZE is usually defined to 1024 in GNU/Linux systems

But where called
? If in
, why I exceed the FD_SETSIZE limit, i'm calling
for 1 pipe individually? Where can i watch python language sources with this call

What workaround not exceed
limit or not use


I solved this issue using epoll. Solution is very simple:

def set_data_pipe_poll(self, data_pipe_poll):
    self.epoll = select.epoll()
    for p in data_pipe_poll:
        self.epoll.register(p, select.EPOLLIN)
    self.data_pipe_poll = data_pipe_poll

def recv_data(self):
    data = []
    events = self.epoll.poll(timeout = 0)
    for fileno, _ in events:
        p = filter(lambda x: x.fileno() == fileno, self.data_pipe_poll)[0]
    return data

When i call epoll.poll() i not call select.