Serge Mosin Serge Mosin - 2 months ago 25x
Python Question

Python socket stress concurrency

I need a Python TCP server that can handle at least tens of thousands of concurrent socket connections. I was trying to test Python SocketServer package capabilities in both multiprocessor and multithreaded modes, but both were far from desired performance.

At first, I'll describe client, because it's common for both cases.

import socket
import sys
import threading
import time

HOST, PORT = "localhost", 9999
data = " ".join(sys.argv[1:])

def client(ip, port, message):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((ip, port))
while 1:

for i in range(SOCKET_AMOUNT):
msg = "test message"
client_thread = threading.Thread(target=client, args=(HOST, PORT, msg))

Multiprocessor server:

import os
import SocketServer

class ForkedTCPRequestHandler(SocketServer.BaseRequestHandler):

def handle(self):
cur_process = os.getpid()
print "launching a new socket handler, pid = {}".format(cur_process)
while 1:

class ForkedTCPServer(SocketServer.ForkingMixIn, SocketServer.TCPServer):

if __name__ == "__main__":
HOST, PORT = "localhost", 9999

server = ForkedTCPServer((HOST, PORT), ForkedTCPRequestHandler)
print "Starting Forked Server"

Multithreaded server:

import threading
import SocketServer

class ThreadedTCPRequestHandler(SocketServer.BaseRequestHandler):

def handle(self):
cur_thread = threading.current_thread()
print "launching a new socket handler, thread = {}".format(cur_thread)
while 1:

class ThreadedTCPServer(SocketServer.ThreadingMixIn, SocketServer.TCPServer):

if __name__ == "__main__":
HOST, PORT = "localhost", 9999

server = ThreadedTCPServer((HOST, PORT), ForkedTCPRequestHandler)
print "Starting Threaded Server"

In the first case, with, only 40 processes are created and approximately 20 of those start breaking in a while with the following error:

error: [Errno 104] Connection reset by peer

on a client side.

Threaded version is much more durable and holds more than 4000 connections, but eventually starts showing

gaierror: [Errno -5] No address associated with hostname

The tests were made on my local machine, Kubuntu 14.04 x64 on kernel v3.13.0-32. These are the steps I've made to increase general performance of the system:

  1. Raise kernel limit on file handles:
    sysctl -w fs.file-max=10000000

  2. Increase the connection backlog,
    sysctl -w net.core.netdev_max_backlog = 2500

  3. Raise the maximum connections,
    sysctl -w net.core.somaxconn = 250000

So, the questions are:

  1. Were the tests correct, can I rely on those results? I'm new to all this Network/Socket stuff, so please correct me in my conclusions.

  2. Is it really the multiprocessor/multithreaded approach not viable in a heavy loaded systems?

  3. If yes, what options do we have left? Asynchronous approach? Tornado/Twisted/Gevent frameworks?


socketserver is not going to handle anywhere near 10k connections. No threaded or forked server will on current hardware and OS's. Thousands of threads means you spend more time context-switching and scheduling than actually working. Modern linux is getting very good at scheduling threads and processes, and Windows is pretty good with threads (but horrible with processes), but there's a limit to what it can do.

And socketserver doesn't even try to be high-performance.

And of course CPython's GIL makes things worse. If you're not using 3.2+; any thread doing even a trivial amount of CPU-bound work is going to choke all of the other threads and block your I/O. With the new GIL, if you avoid non-trivial CPU you don't add too much to the problem, but it still makes context switches more expensive than raw pthreads or Windows threads.

So, what do you want?

You want a single-threaded "reactor" that services events in a loop and kicks off handlers. (On Windows, and Solaris, there are advantages to instead using a "proactor", a pool of threads that all service the same event queue, but since you're on Linux, let's not worry about that.) Modern OS's have very good multiplexing APIs to build on—kqueue on BSD/Mac, epoll on Linux, /dev/poll on Solaris, IOCP on Windows—that can easily handle 10K connections even on hardware from years ago.

socketserver isn't a terrible reactor, it's just that it doesn't provide any good way to dispatch asynchronous work, only threads or processes. In theory, you could build a GreenletMixIn (with the greenlet extension module) or a CoroutineMixIn (assuming you either have or know how to write a trampoline and scheduler) without too much work on top of socketserver, and that might not be too heavy-weight. But I'm not sure how much benefit you're getting out of socketserver at that point.

Parallelism can help, but only to dispatch any slow jobs off the main work thread. First get your 10K connections up, doing minimal work. Then, if the real work you want to add is I/O-bound (e.g., reading files, or making requests to other services), add a pool of threads to dispatch to; if you need to add a lot of CPU-bound work, add a pool of processes instead (or, in some cases, even one of each).

If you can use Python 3.4, the stdlib has an answer in asyncio (and there's a backport on PyPI for 3.3, but it's inherently impossible to backport to earlier versions).

If not… well, you can build something yourself on top of selectors in 3.4+ if you don't care about Windows, or select in 2.6+ if you only care about linux, *BSD, and Mac and are willing to write two versions of your code, but it's going to be a lot of work. Or you can write your core event loop in C (or just use an existing one like libev or libuv or libevent) and wrap it in an extension module.

But really, you probably want to turn to third-party libraries. There are many of them, with very different APIs, from gevent (which tries to make your code look like preemptively threaded code but actually runs in greenlets on a single-threaded event loop) to Twisted (which is based around explicit callbacks and futures, similar to many modern JavaScript frameworks).

StackOverflow isn't a good place to get recommendations for specific libraries, but I can give you a general recommendation: Look them over, pick the one whose API sounds best for your application, test whether it's good enough, and only fall back to another one if the one you like can't cut it (or if you turned out to be wrong about liking the API). Fans of some of these libraries (especially gevent and tornado will tell you that their favorite is "fastest", but who cares about that? What matters is whether they're fast enough and usable to write your app.

Off the top of my head, I'd search for gevent, eventlet, concurrence, cogen, twisted, tornado, monocle, diesel, and circuits. That probably isn't a great list, but if you google all those terms together, I'll bet you'll find an up-to-date comparison, or an appropriate forum to ask on.