max max - 1 month ago 18
Linux Question

Sharing psycopg2 / libpq connections across processes

According to

docs:


libpq
connections shouldn’t be used by a forked processes, so when using a module such as
multiprocessing
or a forking web deploy method such as FastCGI make sure to create the connections after the fork.


Following the link from that document leads to:


On Unix, forking a process with open
libpq
connections can lead to unpredictable results because the parent and child processes share the same sockets and operating system resources. For this reason, such usage is not recommended, though doing an exec from the child process to load a new executable is safe.


But it seems there's no inherent problem with forking processes with open sockets. So what's the reason for
psycopg2
's warning against forking when connections are open?

The reason for my question is that I saw a (presumably successful) multiprocessing approach that opened a connection right before forking.

Perhaps it is safe to fork open connections under some restrictions (e.g., only one process actually ever uses the connection, etc.)?

Answer

Your surmise is basically correct: there is no issue with a connection being opened before a fork as long as you don't attempt to use it in more than one process.

That being said, I think you misunderstood the "multiprocessing approach" link you provided. It actually demonstrates a separate connection being opened in each child. (There is a connection opened by the parent before forking, but it's not being used in any child.)

The improvement given by the answer there (versus the code in the question) was to refactor so that -- rather than opening a new connection for each task in the queue -- each child process opened a single connection and then shared it across multiple tasks executed within the same child (i.e. the connection is passed as an argument to the Task processor).

Edit:
As a general practice, one should prefer creating a connection within the process that is using it. In the answer cited, a connection was being created in the parent before forking, then used in the child. This does work fine, but leaves each "child connection" open in the parent as well, which is at best a waste of resources and also a potential cause of bugs.