Clay Weston Clay Weston - 3 months ago 13
C Question

C Socket select error on Solaris

I am getting a return of -1 on a socket select. However this only happens when we are using a new install of a sybase database. Using this code with the old database, I do not get any socket select errors and everything works fine.

In the below example how_many = 2, and timeout_secs = 60.

Important to note that in the code below when it works file_limits.rlim_cur is 256. However with the new database file_limits.rlim_cur = over 65,000 and the socket select returns -1. I've tried hard coding the first parameter in the select to 256 but it still returns -1.

int socket_activity( int how_many, int *fd, int timeout_secs )
{
int i;
int select_fd;
fd_set read_fds;
fd_set except_fds;
struct timeval timeout;
struct rlimit file_limits;

/*
** Determine the current limits.
*/

if ( getrlimit( RLIMIT_NOFILE, &file_limits ) != 0 )
return( -1 );

/*
** Set up the select structures. Initialize the timeout to the specified
** seconds. Only non-negative file descriptors are initialized.
*/

FD_ZERO( &read_fds );
FD_ZERO( &except_fds );
for ( i = 0; i < how_many; i++ )

if ( fd[i] >= 0 ) {

FD_SET( fd[i], &read_fds );
FD_SET( fd[i], &except_fds );

} /* of if */

timeout.tv_sec = timeout_secs;
timeout.tv_usec = 0;

/*
** Perform the select and check on the results.
*/

select_fd = select( file_limits.rlim_cur,
&read_fds,
NULL,
&except_fds,
&timeout );

if ( select_fd > 0 ) {

/*
** Scan the list of file descriptors and return which file
** descripitor show activity. Only check non-negative file descriptors.
*/

for ( i = 0; i < how_many; i++ )
if ( ( fd[i] >= 0 ) &&
( FD_ISSET( fd[i], &read_fds ) ) )
return( fd[i] );

/*
** No file descriptor showed activity so return zero to indicate
** that a timeout occured.
*/

return( 0 );

} /* of if */

else

/*
** Simply return the return value from select (the function will
** return a 0 on timeout or a -1 on error).
*/

return( select_fd );

} /* of function */

Answer

You really need to post an MCVE to get real help. And this is more informed speculation instead of an actual answer because of that.

First, assuming you're getting passed an int * that points to an int array of open file descriptors, this is useless:

/*
** Determine the current limits.
*/

if ( getrlimit( RLIMIT_NOFILE, &file_limits ) != 0 )
    return( -1 );

If you're getting passed already open file descriptors, the resource limit on the number of open descriptors is utterly irrelevant - the descriptors are already open, and if something were to lower the limit, you might be unable to act on some descriptors.

Second, this is problematic if the limit of open files is greater than the value in FD_SETSIZE. The arrays you pass to select - read_fds and except_fds - each have at most FD_SETSIZE elements:

select_fd = select( file_limits.rlim_cur,
                    &read_fds,
                    NULL,
                    &except_fds,
                    &timeout );

I don't know what FD_SETSIZE is on your Solaris installation, but given your posted "the code below when it works file_limits.rlim_cur is 256" in the question, I strongly suspect that's what's happening. Given the contents of the Solaris select(3C) man page are

Errors

The select() and pselect() functions will fail if:

...

EINVAL The nfds argument is less than 0 or greater than FD_SETSIZE.

you need to fix your code.

And this will lead to starvation of descriptors later in the array:

for ( i = 0; i < how_many; i++ )
    if ( ( fd[i] >= 0 ) &&
         ( FD_ISSET( fd[i], &read_fds ) ) )
        return( fd[i] );

You always return the first descriptor with activity that you find. If the first one in the array is always busy, it's the only one that will get serviced.

You're also ignoring except_fds. If one "goes bad" there in such a way as no other descriptor will ever have readable data, your code will stop doing anything other than looping in select().

Comments