nickdu nickdu - 4 months ago 15
Linux Question

linux: why is connect() blocking when the accept call failed?

I'm doing a blocking connect() call on a client UNIX socket. Below is an example of the code:

// Create socket.

fds[i] = socket(AF_UNIX, SOCK_STREAM, 0);
if (fds[i] == -1)
{
result = -1;
goto done;
}
printf("generate_load thread, fds[%d]: %d\n", i, fds[i]);
// int flags = fcntl(fds[i], F_GETFL);
// fcntl(fds[i], F_SETFL, flags | O_NONBLOCK);

// If we have a timeout value we're only going to use that as
// a connect timeout. From looking at some source code, it
// appears the only way to timeout (correctly) a unix domain
// socket connect() call is to set the send timeout.

struct timeval existing_timeout;
if (timeout != 0)
{
socklen_t len = sizeof(existing_timeout);
getsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &existing_timeout,
&len);

struct timeval tv;
tv.tv_sec = timeout / 1000000;
tv.tv_usec = timeout % 1000000;
setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv));
}

// Set socket name.

memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, socket_name, sizeof(addr.sun_path) - 1);

// @ indicates abstract name and abstract names begin with a NULL
// byte.

if (socket_name[0] == '@')
addr.sun_path[0] = '\0';

// Connect.

result = connect(fds[i], (struct sockaddr*) &addr, sizeof(addr));
if (result == -1)
{
printf("generate_load thread, failed connecting: %d\n", errno);
if (errno == EAGAIN)
errno = ETIMEDOUT;
goto done;
}

printf("generate_load thread, connected fds[%d]: %d\n", i, fds[i]);

// If we set a timeout then set it back to what it was.

if (timeout != 0)
{
setsockopt(fds[i], SOL_SOCKET, SO_SNDTIMEO, &existing_timeout,
sizeof(existing_timeout));
}


This code all works fine until the accepting side, which for now is in the same process, fails due to the file descriptor limit. The accept() call fails with errno = 24 (EMFILE). I'm fine with getting the error, but why is the client not seeing an error? Instead the client is blocked and never returns. As you can see, I commented out the lines that put the socket in non-blocking mode. I believe in non-blocking mode I encounter some EAGAIN errors.

Also, when I hit the file descriptor limit the accepting side appears to constantly be attempting to accept that socket. I'm using select() and waiting for the listening socket to be ready for read. When it is I do an accept(). I can understand getting the first EMFILE error, but I would have thought that error would have been transmitted back to the connect() call, which would have caused the code to break out of its loop and thus no more connect calls will be made which I would have thought would cause the accepting side to be blocked on the select() call.

Below is a snippet of the listening side. The code below is within a while(1) loop which first calls select():

if (FD_ISSET(ti->listen_fd, &read_set) != 0)
{
printf("select thread, accepting socket\n");
int sock = accept(ti->listen_fd, NULL, NULL);
printf("select thread, accepted socket\n");
if (sock == -1)
{
printf("select thread, failed accepting socket: %d\n", errno);
if (error_threshold_met(&eti) == 0)
{
log_event(LOG_LEVEL_ERROR, "select thread, accept() "
"failed: %s", get_error_string(errno, error_string,
sizeof(error_string)));
}
}


The code appears to work fine until I hit the 1024 file descriptor limit. Any ideas why it's behaving this way? Should it be and I'm just not understanding how it should be working?

Thanks,
Nick

EJP EJP
Answer

connect() and accept() are not interlocked. You can call connect() and have it return without ever calling accept() at all. The server-side part of the TCP handshake happens in the kernel independently of accept(). All that accept() does is pick an incoming connection off a queue and create a socket around it, blocking while the queue is empty. The socket-creation part is failing due to FD exhaustion, but the actual connection is already established.

Comments