Anton Dyachenko Anton Dyachenko - 1 year ago 77
C++ Question

winsock select reports an 10022 error (by a reason that is not obvious, need tcp/winsock guru help)

Given (simplified piece of code):

const timeval timeout = {100, 0};
fd_set sockets = {2, {service, controlSocket}};
const auto result = select(0, &sockets, nullptr, nullptr, &timeout);
if (result > 0 && FD_ISSET(service, &sockets))
auto workerConnection = accept(service, nullptr, nullptr);
WSARecv(workerConnection, ...);

where service is a socket in the listening state, controlSocket is the very first established incoming connection on service socket which is used for communication with manager.

Error 10022 happens only when any previously established worker connection is gracefully closed on a remote side. I emphasize this code works as supposed to work almost all time except worker application closes a connection, however, I don't claim that closing connection on a remote side can affect this behavior. I just noticed that this error happens exactly after closing connection by a remote side while service side reporting WAIT_CLOSE for one of the established connections. I repeat it again, this code works correctly, after an error on next cycle of the loop and until another socket is closed select works with no error at all. The number of error is equal to the number of established workerConnections.

Don't know is it important or not I test this code running manager, service, and worker applications on the same PC not using localhost instead I provide network name of the PC to establish connections. After establishing a connection with worker service application receives and sends data by the workerConnection asynchronously only.

Question: Due to it doesn't affect service functionality at all and I don't have any other WSA errors is it ok to ignore this error and avoid spamming logs about it or I miss something important and have to fix the issue? I suspect some kind of issue in WSA because error happens only in a very specific set of circumstances. Am I correct?

Answer Source

Either I can't read MSDN documentation properly or it is not clear but there was a significant issue in my code. MSDN says:

Note When issuing a blocking Winsock call such as select with the timeout parameter set to NULL, Winsock may need to wait for a network event before the call can complete. Winsock performs an alertable wait in this situation, which can be interrupted by an asynchronous procedure call (APC) scheduled on the same thread. Issuing another blocking Winsock call inside an APC that interrupted an ongoing blocking Winsock call on the same thread will lead to undefined behavior, and must never be attempted by Winsock clients.


Note The shutdown function does not block regardless of the SO_LINGER setting on the socket.

From my point of view, it is highly unclear what they want to say, but the problem is that when I call select with timeout blocking call, even though MSDN claims it is not blocking, windows may simultaneously execute my callback provided in WSARecv. In that call back I did another synchronous Winsock call - shutdown, even though MSDN claims that it is not blocking. So my original code exactly conforms to case noted in MSDN which leads to undefined behaviour. My original issue had raised when I tested debug configuration and there was only one issue. After I had started to test release configuration I saw much more issues.

Even though I can't provide a reference to MSDN I concluded following (that is much clear than the MSDN notes provided above):

You are not allowed to call any Winsock function from inside your callback which is called by windows after overlapped operation had done except those Winsock functions which have LPOVERLAPPED input parameter and this parameter must not be NULL. On the other hand, you are allowed to call other blocking windows functions if they don't use Winsock.