连接到套接字时似乎无法超时工作

Can't seem to get a timeout working when connecting to a socket

我正在尝试为 connect() 提供超时。我四处搜索,发现了几篇与此相关的文章。我已经编写了我认为应该工作的代码,但不幸的是我没有收到 getsockopt() 报告的错误。但是当我进入 write() 时,它失败了,errno 为 107 - ENOTCONN。

几点。我在 Fedora 23 上 运行。connect() 的文档说它应该 return 失败,错误号为 EINPROGRESS 对于尚未完成的连接但是我正在经历 EAGAIN 所以我将其添加到我的支票。目前我的套接字服务器在 listen() 调用中将积压设置为零。许多调用都成功了,但失败的调用都失败了,并显示我在 write() 调用中提到的 107 - ENOTCONN。

我希望我只是遗漏了一些东西,但到目前为止还不知道是什么。

int domain_socket_send(const char* socket_name, unsigned char* buffer,
        unsigned int length, unsigned int timeout)
{
    struct sockaddr_un addr;
    int fd = -1;
    int result = 0;

    // Create socket.

    fd = socket(AF_UNIX, SOCK_STREAM, 0);
    if (fd == -1)
        {
        result = -1;
        goto done;
        }

    if (timeout != 0)
        {

        // Enabled non-blocking.

        int flags;
        flags = fcntl(fd, F_GETFL);
        fcntl(fd, F_SETFL, flags | O_NONBLOCK);
        }

    // Set socket name.

    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, socket_name, sizeof(addr.sun_path) - 1);

    // Connect.

    result = connect(fd, (struct sockaddr*) &addr, sizeof(addr));
    if (result == -1)
        {

        // If some error then we're done.

        if ((errno != EINPROGRESS) && (errno != EAGAIN))
            goto done;

        fd_set write_set;
        struct timeval tv;

        // Set timeout.

        tv.tv_sec = timeout / 1000000;
        tv.tv_usec = timeout % 1000000;

        unsigned int iterations = 0;
        while (1)
            {
            FD_ZERO(&write_set);
            FD_SET(fd, &write_set);

            result = select(fd + 1, NULL, &write_set, NULL, &tv);
            if (result == -1)
                goto done;
            else if (result == 0)
                {
                result = -1;
                errno = ETIMEDOUT;
                goto done;
                }
            else
                {
                if (FD_ISSET(fd, &write_set))
                    {
                    socklen_t len;
                    int socket_error;
                    len = sizeof(socket_error);

                    // Get the result of the connect() call.

                    result = getsockopt(fd, SOL_SOCKET, SO_ERROR,
                            &socket_error, &len);
                    if (result == -1)
                        goto done;

                    // I think SO_ERROR will be zero for a successful
                    // result and errno otherwise.

                    if (socket_error != 0)
                        {
                        result = -1;
                        errno = socket_error;
                        goto done;
                        }

                    // Now that the socket is writable issue another connect.

                    result = connect(fd, (struct sockaddr*) &addr,
                            sizeof(addr));
                    if (result == 0)
                        {
                        if (iterations > 1)
                            {
                            printf("connect() succeeded on iteration %d\n",
                                    iterations);
                            }
                        break;
                        }
                    else
                        {
                        if ((errno != EAGAIN) && (errno != EINPROGRESS))
                            {
                            int err = errno;
                            printf("second connect() failed, errno = %d\n",
                                    errno);
                            errno = err;
                            goto done;
                            }
                        iterations++;
                        }
                    }
                }
            }
        }

    // If we put the socket in non-blocking mode then put it back
    // to blocking mode.

    if (timeout != 0)
        {

        // Turn off non-blocking.

        int flags;
        flags = fcntl(fd, F_GETFL);
        fcntl(fd, F_SETFL, flags & ~O_NONBLOCK);
        }

    // Write buffer.

    result = write(fd, buffer, length);
    if (result == -1)
        {
        int err = errno;
        printf("write() failed, errno = %d\n", err);
        errno = err;
        goto done;
        }

done:
    if (result == -1)
        result = errno;
    else
        result = 0;
    if (fd != -1)
        {
        shutdown(fd, SHUT_RDWR);
        close(fd);
        }
    return result;
}

2016 年 4 月 5 日更新:

我突然意识到,也许我需要多次调用 connect() 直到成功,毕竟这是非阻塞 io 而不是异步 io。就像我在 read() 上遇到 EAGAIN 后,当有数据要读取时必须再次调用 read() 一样。此外,我发现了以下SO问题:

Using select() for non-blocking sockets to connect always returns 1

其中 EJP 的回答说您需要发出多个 connect()。此外,来自 EJP 参考书:

https://books.google.com/books?id=6H9AxyFd0v0C&pg=PT681&lpg=PT681&dq=stevens+and+wright+tcp/ip+illustrated+non-blocking+connect&source=bl&ots=b6kQar6SdM&sig=kt5xZubPZ2atVxs2VQU4mu7NGUI&hl=en&sa=X&ved=0ahUKEwjmp87rlfbLAhUN1mMKHeBxBi8Q6AEIIzAB#v=onepage&q=stevens%20and%20wright%20tcp%2Fip%20illustrated%20non-blocking%20connect&f=false

这似乎表明您需要发出多个 connect()。我已经修改了这个问题中的代码片段来调用 connect() 直到它成功。我可能仍然需要围绕可能更新传递给 select() 的超时值进行更改,但这不是我的直接问题。

多次调用 connect() 似乎已经解决了我最初的问题,即我在调用 write() 时收到 ENOTCONN,我猜是因为套接字未连接。但是,您可以从代码中看到我正在跟踪 select 循环的次数,直到 connect() 成功。我已经看到这个数字达到了数千。这让我担心我正处于繁忙的等待循环中。为什么套接字即使不在 connect() 会成功的状态下也是可写的?正在调用 connect() 清除可写状态并且由于某种原因它被 OS 再次设置,还是我真的处于繁忙的等待循环中?

谢谢, 尼克

您对 select() 的错误处理可能需要一些清理工作。除非设置了 except_set,否则您实际上不需要查询 SO_ERROR。如果 select() returns > 0 则设置 write_set and/or except_set,如果未设置 except_set 则连接成功。

尝试更像这样的东西:

int domain_socket_send(const char* socket_name, unsigned char* buffer,
    unsigned int length, unsigned int timeout)
{
    struct sockaddr_un addr;
    int fd;
    int result;

    // Create socket.

    fd = socket(AF_UNIX, SOCK_STREAM, 0);
    if (fd == -1)
        return errno;

    if (timeout != 0)
        {

        // Enabled non-blocking.

        int flags = fcntl(fd, F_GETFL);
        fcntl(fd, F_SETFL, flags | O_NONBLOCK);
        }

    // Set socket name.

    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, socket_name, sizeof(addr.sun_path) - 1);

    // Connect.

    result = connect(fd, (struct sockaddr*) &addr, sizeof(addr));
    if (result == -1)
        {

        // If some error then we're done.

        if ((errno != EINPROGRESS) && (errno != EAGAIN))
            goto done;

        // Now select() to find out when connect() has finished.

        fd_set write_set;
        fd_set except_set;

        FD_ZERO(&write_set);
        FD_ZERO(&write_set);
        FD_SET(fd, &write_set);
        FD_SET(fd, &except_set);

        struct timeval tv;

        // Set timeout.

        tv.tv_sec = timeout / 1000000;
        tv.tv_usec = timeout % 1000000;

        result = select(fd + 1, NULL, &write_set, &except_set, &tv);
        if (result == -1)
            {
            goto done;
            }
        else if (result == 0)
            {
            result = -1;
            errno = ETIMEDOUT;
            goto done;
            }
        else if (FD_ISSET(fd, &except_set))
            {
            int socket_error;
            socklen_t len = sizeof(socket_error);

            // Get the result of the connect() call.

            result = getsockopt(fd, SOL_SOCKET, SO_ERROR, &socket_error, &len);
            if (result != -1)
                {
                result = -1;
                errno = socket_error;
                }

            goto done;
            }
        else
            {
            // connected
            }
        }

    // If we put the socket in non-blocking mode then put it back
    // to blocking mode.

    if (timeout != 0)
        {
        int flags = fcntl(fd, F_GETFL);
        fcntl(fd, F_SETFL, flags & ~O_NONBLOCK);
        }

    // Write buffer.

    result = write(fd, buffer, length);

done:
    if (result == -1)
        result = errno;
    else
        result = 0;

    if (fd != -1)
        {
        shutdown(fd, SHUT_RDWR);
        close(fd);
        }

    return result;
}

来自http://lxr.free-electrons.com/source/net/unix/af_unix.c

441 static int unix_writable(const struct sock *sk)
442 {
443         return sk->sk_state != TCP_LISTEN &&
444                (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
445 }

我不确定正在比较的这些缓冲区是什么,但很明显没有检查套接字的连接状态。因此,除非在套接字连接时修改这些缓冲区,否则我的 unix 套接字将始终标记为可写,因此我无法使用 select() 来确定非阻塞 connect() 何时完成。

并基于 http://lxr.free-electrons.com/source/net/unix/af_unix.c 中的这个片段:

1206 static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
1207                                int addr_len, int flags)
.
.
.
1230         timeo = sock_sndtimeo(sk, flags & O_NONBLOCK);
.
.
.
1271         if (unix_recvq_full(other)) {
1272                 err = -EAGAIN;
1273                 if (!timeo)
1274                         goto out_unlock;
1275 
1276                 timeo = unix_wait_for_peer(other, timeo);
.
.
.

看来设置发送超时可能会导致连接超时。这也与 http://man7.org/linux/man-pages/man7/socket.7.html.

的 SO_SNDTIMEO 的文档匹配

谢谢, 尼克