RobertK RobertK - 3 months ago 17
C Question

select on socket slow in Linux

I have a weird problem with select taking unexpected long on a socket in Linux.


  • The server is receiving data all the time, receive socket buffer size is 65536.

  • a client sending data all the time, send socket buffer size is 4096.



In general the data transfer is really fast. But: a select in the client to test if a write will not block takes really long (sending data without calling select before: 0.5s, sending same data with calling select before actual sending the data: 5s). The problem is specific to the buffer sizes. If I increase the send buffer in the client to let's say 4*4096 the problem goes away.

Now I want to know why the select takes so long with the specific buffer sizes. Example code is here: http://pastebin.com/PqisLnLU

Same code runs on Windows and even Windows Subsystem for Linux without these weird behavior.

Thanks!

Answer

You are seeing the effect of Nagle's algorithm, which is used to improve TCP throughput at the cost of latency.

The writes are relatively small and are being delayed in case more data is written in the near future, which could then be bundled together in a single IP packet. When you use select before sending, you are not sending more (because the send buffer is still full) and so there is a significant delay before the packet is sent (and the buffer is emptied). When instead you do not use select the buffer is full and so it is shunted through the network stack immediately.

When you increase the buffer size sufficiently, a suitable IP packet size is reached at some point during filling of the buffer, and the data is pushed through the network immediately (and cleared from the send buffer when receipt is acknowledged) - so there is no delay.

Try disabling Nagle's algorithm (in the client):

#include <netinet/tcp.h>

...

value = 1;
if (setsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, (char*)&value, sizeof(int)))
{
    printf("\n Error : SetSockOpt TCP_NODELAY Failed \n");
}

You will see that the variant using select is then just as fast as the variant with no select operation.

Comments