Thibaut D. Thibaut D. - 1 year ago 65
Linux Question

Why select() in a parent process makes accept() unusable in a child process?

I have a parent process which creates 2 server sockets and calls

on them to wait for new connection. When the connection arrives, a message is sent to a child process (created with
, after servers sockets creation, so they are shared).

In this child, calling
on the server socket doesn't work. I got a
error (non-blocking socket). Whereas calling
in the main process works perfectly.

Of course, I don't call
in the main process at all, I just tested to check if it worked, and it does.

Why can't I call
in a child process after a
in the parent?

EDIT: The goal here is to create a fixed number of workers (let's say 8) to handle clients connections, as in the prefork model. These connections will be long-connections, not like HTTP. The goal is to load-balance connections between workers.

To do this, I use a shared memory variable which contains for a worker the number of currently connected clients. I want to "ask" the worker with the lowest number of clients to handle a new connection.

That's why I do the
in the parent, and then send a message to a child process, because I want to "choose" which process will handle the new connection.

The server listen on more than one sockets (one for ssl, one without), that's why I use
and not directly
in children processes, because I can't
on multiple sockets in my children workers.

Answer Source

In fact, the problem was not what I first thought. Here is a recap of what I did to have some basic load-balancing of connections between my worker processes.

  • A main process (the parent) creates 2 server sockets, bind() and listen() them (with and without ssl for example)
  • I create 8 children processes with a fork(), so they inherit the parent's sockets
  • The main process runs select() in an infinite loop
  • When one of its two sockets is available, it sends a message to a child over a pipe. The child is determined thanks to a shared memory value, which contains the current number of clients "in the child process". The process which currently handle the lowest number of clients is chosen.
  • This child process then calls accept() on the server socket (the socket to used between the two is passed in the pipe, so the child knows which one to call accept() on)

The problem was that my parent process told a child to accept the socket and re-enter the loop immediately after, which it runs select() again. But if the child has not yet accepted the socket, select() returns again, for the same connection. That's why I got a EAGAIN error, in fact I called accept() twice (or more depending on speedinter process race conditions)!

The solution is to wait for the child to answer something on the pipe like "Hey, I accepted the connection, it's ok!", and then returns to the select() loop.

This works perfectly fine. The implementation in Python is available here for the curious : !