Read/write 恰好 N 个字节 from/to Unix 上 C 的文件描述符

Read/write exactly N bytes from/to file descriptor with C on Unix

我知道 read/write 来自 <unistd.h> 的 C 函数不能保证 read/write 正好是 size_t nbyte 参数所要求的 N 个字节(特别是套接字)。

如何read/write满缓冲区from/to一个文件(或套接字)描述符?

readwrite 都成功 return ssize_t 包含字节数 read/written。你可以用它来构造一个循环:

靠谱read():

ssize_t readall(int fd, void *buff, size_t nbyte) {
    size_t nread = 0; size_t res = 0;
    while (nread < nbyte) {
        res = read(fd, buff+nread, nbyte-nread);
        if (res == 0) break;
        if (res == -1) return -1;
        nread += res;
    }
    return nread;
}

靠谱write()(差不多):

ssize_t writeall(int fd, void *buff, size_t nbyte) {
    size_t nwrote = 0; size_t res = 0;
    while (nwrote < nbyte) {
        res = write(fd, buff+nwrote, nbyte-nwrote);
        if (res == 0) break;
        if (res == -1) return -1;
        nwrote += res;
    }
    return nwrote;
}

基本上它 reads/writes 直到总字节数 != nbyte.


请注意,此答案仅使用 <unistd.h> 函数,前提是有使用它的理由。如果您也可以使用 <stdio.h>,请参阅 , which uses fdopen;setvbuf and then fread/fwrite. Also, take a look at 以获得具有很多功能的 read_range 函数。

read()write() 不能保证传输请求的全部字节数是一个特性,而不是缺点。如果该功能在特定应用程序中妨碍了您,那么最好使用标准库的现有设施来处理它,而不是自己动手(尽管我当然 不时自己滚)。

具体来说,如果您有一个文件描述符,您希望始终在其上传输准确的字节数,那么您应该考虑使用 fdopen() 将其包装在流中,然后使用 I/O 执行 fread()fwrite()。您也可以使用 setvbuf() 来避免中间缓冲区。作为一个可能的奖励,您还可以使用其他流函数,例如 fgets()fprintf().

示例:

int my_fd = open_some_resource();
// if (my_fd < 0) ...
FILE *my_file = fdopen(my_fd, "r+b");
// if (my_file == NULL) ...
int rval = setvbuf(my_file, NULL, _IONBF, 0);
// if (rval != 0) ...

请注意,此后最好只使用 流,而不是底层文件描述符,这是这种方法的主要缺点。另一方面,你可能允许fd丢失,因为关闭流也会关闭底层fd。

使 fread()fwrite() 传输全缓冲单元(或失败)不需要特别特殊的东西:

char buffer[BUF_SIZE];
size_t blocks = fread(buffer, BUF_SIZE, 1, my_file);
// if (blocks != 1) ...

// ...

blocks = fwrite(buffer, BUF_SIZE, 1, my_file);
// if (blocks != 1) ...

但是请注意,您必须正确安排第二个和第三个参数的顺序。第二个是传输单元大小,第三个是要传输的单元数。除非发生错误或文件结束,否则不会传输部分单元。将传输单元指定为您要传输的全部字节数并要求(因此)恰好一个单元是实现您所询问的语义的原因。

你使用循环。

例如,通过适当的错误检查:

/** Read a specific number of bytes from a file or socket descriptor
 * @param fd        Descriptor
 * @param dst       Buffer to read data into
 * @param minbytes  Minimum number of bytes to read
 * @param maxbytes  Maximum number of bytes to read
 * @return          Exact number of bytes read.
 * errno is always set by this call.
 * It will be set to zero if an acceptable number of bytes was read.
 * If there was 
  and to nonzero otherwise.
 *                  If there was not enough data to read, errno == ENODATA.
*/
size_t  read_range(const int fd, void *const dst, const size_t minbytes, const size_t maxbytes)
{
    if (fd == -1) {
        errno = EBADF;
        return 0;
    } else
    if (!dst || minbytes > maxbytes) {
        errno = EINVAL;
        return 0;
    }

    char       *buf = (char *)dst;
    char *const end = (char *)dst + minbytes;
    char *const lim = (char *)dst + maxbytes;

    while (buf < end) {
        ssize_t n = read(fd, buf, (size_t)(lim - buf));
        if (n > 0) {
            buf += n;
        } else
        if (n == 0) {
            /* Premature end of input */
            errno = ENODATA;  /* Example only; use what you deem best */
            return (size_t)(buf - (char *)dst);
        } else
        if (n != -1) {
            /* C library or kernel bug */
            errno = EIO;
            return (size_t)(buf - (char *)dst);
        } else {
            /* Error, interrupted by signal delivery, or nonblocking I/O would block. */
            return (size_t)(buf - (char *)dst);
        }
    }

    /* At least minbytes, up to maxbytes received. */
    errno = 0;
    return (size_t)(buf - (char *)dst);
}

有些人确实觉得它在成功调用时将 errno 清除为零很奇怪,但这在标准和 POSIX C 中都是完全可以接受的。

在这里,这意味着典型的用例是简单而健壮的。例如,

    struct message  msgs[MAX_MSGS];

    size_t  bytes = read_range(fd, msgs, sizeof msgs[0], sizeof msgs);
    if (errno) {
        /* Oops, things did not go as we expected.  Deal with it.
           If bytes > 0, we do have that many bytes in msgs[].
        */
    } else {
        /* We have bytes bytes in msgs.
           bytes >= sizeof msgs[0] and bytes <= sizeof msgs.
        */
    }

如果您有一个包含固定或可变大小消息的模式,以及一个一个一个地使用它们的函数,请不要假设最好的选择是尝试一次只读取一条消息,因为它不是。

这也是上面示例使用 minbytesmaxbytes 而不是单个 exactly_this_many_bytes 参数的原因。

一个更好的模式是有一个更大的缓冲区,只有在你必须的时候你才 memmove() 数据(因为你 运行 空间不足,或者因为下一条消息不够对齐)。

例如,假设您有一个流套接字或文件描述符,其中每个传入消息都由三个字节组成 header:第一个字节标识消息类型,接下来的两个字节(例如,次要字节优先)标识与消息关联的数据有效负载字节数。这意味着消息的最大总长度为 1+2+65535 = 65538 字节。

为了有效地接收消息,您将使用动态分配的缓冲区。缓冲区大小是一个软件工程问题,除此之外它必须至少 65538 字节,它的大小——甚至是否应该动态增长和收缩——视情况而定。因此,我们假设我们有 unsigned char *data; 指向已分配大小 size_t size; 的缓冲区。

循环本身可能如下所示:

    size_t  head = 0;  /* Offset to current message */
    size_t  tail = 0;  /* Offset to first unused byte in buffer */
    size_t  mlen = 0;  /* Total length of the current message; 0 is "unknown"*/

    while (1) {

        /* Message processing loop. */
        while (head + 3 <= tail) {

            /* Verify we know the total length of the message
               that starts at offset head. */
            if (!mlen)
                mlen = 3 + (size_t)(data[head + 1])
                         + (size_t)(data[head + 2]) << 8;

            /* If that message is not yet complete, we cannot process it. */
            if (head + mlen > tail)
                break;

            /*             type        datalen,  pointer to data */
            handle_message(data[head], mlen - 3, data + head + 3);

            /* Skip message in buffer. */
            head += mlen;

            /* Since we do not know the length of the next message,
               or rather, the current message starting at head,
               we do need to reset mlen to "unknown", 0. */
            mlen  = 0;
        }

        /* At this point, the buffer contains less than one full message.
           Whether it is better to always move a partial leftover message
           to the beginning of the buffer, or only do so if the buffer
           is full, depends on the workload and buffer size.
           The following one may look complex, but it is actually simple.
           If the current start of the buffer is past the halfway mark,
           or there is no more room at the end of the buffer, we do the move.
           Only if the current message starts in the initial half, and
           when there is room at the end of the buffer, we leave it be.
           But first: If we have no data in the buffer, it is always best
           to start filling it from the beginning.
        */
        if (head >= tail) {
            head = 0;
            tail = 0;
        } else
        if (head >= size/2 || tail >= size) {
            memmove(data, data + head, tail - head);
            tail -= head;
            head = 0;
        }

        /* We do not have a complete message, but there
           is room in the buffer (assuming size >= 65538),
           we need to now read more data into the buffer. */
        ssize_t  n = read(sourcefd, data + tail, size - tail);
        if (n > 0) {
            tail += n;

            /* Check if it completed one or more messages. */
            continue;

        } else
        if (n == 0) {
            /* End of input.  If buffer is empty, that's okay. */
            if (head >= tail)
                break;

            /* Ouch: We have partial message in the buffer,
                     but there will be no more incoming data! */
            ISSUE_WARNING("Discarding %zu byte partial message due to end of input.\n", tail - head);
            break;

        } else
        if (n != -1) {
            /* This should not happen.  If it does, it is a C library
               or kernel bug.  We treat it as fatal. */
            ISSUE_ERROR("read() returned %zd; dropping connection.\n", n);
            break;

        } else
        if (errno != EINTR) {
            /* Everything except EINTR indicates an error to us; we do
               assume that sourcefd is blocking (not nonblocking). */
            ISSUE_ERROR("read() failed with errno %d (%s); dropping connection.\n", errno, strerror(errno));
            break;
        }

        /* The case n == -1, errno == EINTR usually occurs when a signal
           was delivered to a handler using this thread, and that handler
           was installed without SA_RESTART.  Depending on what kind of
           a device or socket sourcefd is, there could be additional cases;
           but in general, it just means "something unrelated happened,
           but you were to be notified about it, so EINTR you get".
           Simply put, EINTR is not really an error, just like
           EWOULDBLOCK/EAGAIN is not an error for nonblocking descriptors,
           they're just easiest to treat as an "error-like situation" in C.
        */
    }

    /* close(sourcefd); */

请注意循环实际上并没有尝试读取任何特定数量的数据?它只是尽可能多地读取,并在进行时进行处理。

是否可以通过首先准确读取 three-byte header,然后准确读取数据负载,准确地 读取此类消息?当然可以,但这意味着您进行了大量的系统调用;每条消息至少两个。如果消息很常见,您可能不想这样做,因为系统调用开销。

能否更小心地使用可用缓冲区,并尽快从缓冲区中的下一条消息中删除类型和数据有效负载长度?好吧,这是一个应该与以前编写过此类代码的同事或开发人员讨论的问题。有积极的(主要是,你节省了三个字节)和消极的(增加了代码的复杂性,这总是使代码更难长期维护,并有引入错误的风险)。在只有 128 字节缓冲区用于输入命令消息的微控制器上,我可能会这样做;但在台式机或服务器上,这种代码更喜欢几百千字节而不是几兆字节的缓冲区(因为内存“浪费”通常被较少数量的系统调用所覆盖,尤其是在处理大量消息时)。没有快速的答案! :)-