Math is Hard Math is Hard - 1 month ago 13
C++ Question

C++ TCP recv unknown buffer size

I want to use the function

recv(socket, buf, len, flags)
to receive an incoming packet. However I do not know the length of this packet prior to runtime so the first 8 bytes are supposed to tell me the length of this packet. I don't want to just allocate an arbitrarily large
len
to accomplish this so is it possible to set
len = 8
have
buf
be a type of
uint64_t
. Then afterwards

memcpy(dest, &buf, buf)
?

Answer

Since TCP is stream-based, I'm not sure what type of packages you mean. I will assume that you are referring to application level packages. I mean packages which are defined by your application and not by underlying protocols like TCP. I will call them messages instead to avoid confusion.

I will show two possibilities. First I will show, how you could read a message without knowing the length before you have finished reading. The second example will do two calls. First it reads the size of the message. Then it read the whole message at once.


Read data until the message is complete

Since TCP is stream-based, you will not loss any data when your buffer is not big enough. So you can read a fixed amount of bytes. If something is missing, you can call recv again. Here is a extensive example. I just wrote it without testing. I hope everything would work.

std::size_t offset = 0;
std::vector<char> buf(512);

std::vector<char> readMessage() {
    while (true) {
        ssize_t ret = recv(fd, buf.data() + offset, buf.size() - offset, 0);
        if (ret < 0) {
            if (errno == EINTR) {
                // Interrupted, just try again ...
                continue;
            } else {
                // Error occured. Throw exception.
                throw IOException(strerror(errno));
            }
        } else if (ret == 0) {
            // No data available anymore.
            if (offset == 0) {
                // Client did just close the connection
                return std::vector<char>(); // return empty vector
            } else {
                // Client did close connection while sending package?
                // It is not a clean shutdown. Throw exception.
                throw ProtocolException("Unexpected end of stream");
            }
        } else if (isMessageComplete(buf)) {
            // Message is complete.
            buf.resize(offset + ret); // Truncate buffer
            std::vector<char> msg = std::move(buf);
            std::size_t msgLen = getSizeOfMessage(msg);
            if (msg.size() > msgLen) {
                // msg already contains the beginning of the next message.
                // write it back to buf
                buf.resize(msg.size() - msgLen)
                std::memcpy(buf.data(), msg.data() + msgLen, msg.size() - msgLen);
                msg.resize(msgLen);
            }
            buf.resize(std::max(2*buf.size(), 512)) // prepare buffer for next message
            return msg;
        } else {
            // Message is not complete right now. Read more...
            offset += ret;
            buf.resize(std::max(buf.size(), 2 * offset)); // double available memory
        }
    }
}

You have to define bool isMessageComplete(std::vector<char>) and std::size_t getSizeOfMessage(std::vector<char>) by yourself.

Read the header and check the length of the package

The second possibility is to read the header first. Just the 8 bytes which contains the size of the package in your case. After that, you know the size of the package. This mean you can allocate enough storage and read the whole message at once:

/// Reads n bytes from fd.
bool readNBytes(int fd, void *buf, std::size_t n) {
    std::size_t offset = 0;
    char *cbuf = reinterpret_cast<char*>(buf);
    while (true) {
        ssize_t ret = recv(fd, cbuf + offset, n - offset, MSG_WAITALL);
        if (ret < 0 && errno != EINTR) {
            // Error occurred
            throw IOException(strerror(errno));
        } else if (ret == 0) {
            // No data available anymore
            if (offset == 0) return false;
            else             throw ProtocolException("Unexpected end of stream");
        } else if (offset + ret == n) {
            // All n bytes read
            return true;
        } else {
            offset += ret;
        }
    }
}

/// Reads message from fd
std::vector<char> readMessage(int fd) {
    std::uint64_t size;
    if (readNBytes(fd, &size, sizeof(size))) {
        std::vector buf(size);
        if (readNBytes(fd, buf.data(), size)) {
            return buf;
        } else {
            throw ProtocolException("Unexpected end of stream");
        }
    } else {
        // connection was closed
        return std::vector<char>();
    }
}

The flag MSG_WAITALL requests that the function blocks until the full amount of data is available. However, you cannot rely on that. You have to check it and read again if something is missing. Just like I did it above.

readNBytes(fd, buf, n) reads n bytes. As far as the connection was not closed from the other side, the function will not return without reading n bytes. If the connection was closed by the other side, the function returns false. If the connection was closed in the middle of a message, an exception is thrown. If an i/o-error occurred, another exception is thrown.

readMessage reads 8 bytes [sizeof(std::unit64_t)] und use them as size for the next message. Then it reads the message.

If you want to have platform independency, you should convert size to a defined byte order. Computers (with x86 architecture) are using little endian. It is common to use big endian in network traffic.

Note: With MSG_PEEK it is possible to implement this functionality for UDP. You can request the header while using this flag. Then you can allocate enough space for the whole package.