C - select() 似乎阻塞的时间比超时时间长

C - select() seems to block for longer than timeout

我正在编写一个数据采集程序,需要

ADC 目前无关紧要。

在循环结束时,我再次使用 select() 和 0 超时来轮询并查看数据是否已经可用,如果是则意味着我已经超限,即我希望循环在更多数据之前结束,并且循环开始时的 select() 会阻塞并在它到达时立即获取它。

数据应该每 5 毫秒到达一次,我的第一个 select() 超时计算为(5.5 毫秒 - 循环时间)- 应该约为 4 毫秒。

我没有超时,但有很多超限。

检查时间戳表明 select() 阻塞的时间比超时时间长(但仍然 returns>0)。 在超时前获取数据后看起来 select() returns 晚了。

这可能会在 1000 次重复中发生 20 次。 可能是什么原因?我该如何解决?

编辑: 这是代码的缩减版本(我做的错误检查比这多得多!)

#include <bcm2835.h> /* for bcm2835_init(), bcm2835_close() */

int main(int argc, char **argv){

    int err = 0;

    /* Set real time priority SCHED_FIFO */
    struct sched_param sp;
    sp.sched_priority = 30;
    if ( pthread_setschedparam(pthread_self(), SCHED_FIFO, &sp) ){
        perror("pthread_setschedparam():");
        err = 1;
    }

    /* 5ms between samples on /dev/ttyUSB0 */
    int interval = 5;

    /* Setup tty devices with termios, both totally uncooked, 8 bit, odd parity, 1 stop bit, 115200baud */
    int fd_wc=setup_serial("/dev/ttyAMA0");
    int fd_sc=setup_serial("/dev/ttyUSB0");

    /* Setup GPIO for SPI, SPI mode, clock is ~1MHz which equates to more than 50ksps */
    bcm2835_init();
    setup_mcp3201spi();

    int collecting = 1;

    struct timespec starttime;
    struct timespec time;
    struct timespec ftime;
    ftime.tv_nsec = 0;

    fd_set readfds;
    int countfd;
    struct timeval interval_timeout;
    struct timeval notime;

    uint16_t p1;
    float w1;

    uint8_t *datap = malloc(8);
    int data_size;
    char output[25];

    clock_gettime(CLOCK_MONOTONIC, &starttime);

    while ( !err && collecting ){   
        /* Set timeout to (5*1.2)ms - (looptime)ms, or 0 if looptime was longer than (5*1.2)ms */
        interval_timeout.tv_sec = 0;
        interval_timeout.tv_usec = interval * 1200 - ftime.tv_nsec / 1000;
        interval_timeout.tv_usec = (interval_timeout.tv_usec < 0)? 0 : interval_timeout.tv_usec;
        FD_ZERO(&readfds);
        FD_SET(fd_wc, &readfds);    
        FD_SET(0, &readfds); /* so that we can quit, code not included */   
        if ( (countfd=select(fd_wc+1, &readfds, NULL, NULL, &interval_timeout))<0 ){
            perror("select()");
            err = 1;
        } else if (countfd == 0){
            printf("Timeout on select()\n");
            fflush(stdout);
            err = 1;
        } else if (FD_ISSET(fd_wc, &readfds)){
            /* timestamp for when data is just available */
            clock_gettime(CLOCK_MONOTONIC, &time)
            if (starttime.tv_nsec > time.tv_nsec){
                time.tv_nsec = 1000000000 + time.tv_nsec - starttime.tv_nsec;
                time.tv_sec = time.tv_sec - starttime.tv_sec - 1;
            } else {
                time.tv_nsec = time.tv_nsec - starttime.tv_nsec;
                time.tv_sec = time.tv_sec - starttime.tv_sec;
            }

            /* get ADC value, which is sampled fast so corresponds to timestamp */
            p1 = getADCvalue();

            /* receive_frame, receiving is slower so do it after getting ADC value. It is timestamped anyway */
            /* This function consists of a loop that gets data from serial 1 byte at a time until a 'frame' is collected. */
            /* it uses select() with a very short timeout (enough for 1 byte at baudrate) just to check comms are still going */
            /* It never times out and behaves well */
            /* The interval_timeout is passed because it is used as a timeout for responding an ACK to the device */
            /* That select also never times out */
            ireceive_frame(&datap, fd_wc, &data_size, interval_timeout.tv_sec, interval_timeout.tv_usec);

            /* do stuff with it */
            /* This takes most of the time in the loop, about 1.3ms at 115200 baud */
            snprintf(output, 24, "%d.%04d,%d,%.2f\n", time.tv_sec, time.tv_nsec/100000, pressure, w1);
            write(fd_sc, output, strnlen(output, 23)); 

            /* Check how long the loop took (minus the polling select() that follows */ 
            clock_gettime(CLOCK_MONOTONIC, &ftime);
            if ((time.tv_nsec+starttime.tv_nsec) > ftime.tv_nsec){
                ftime.tv_nsec = 1000000000 + ftime.tv_nsec - time.tv_nsec - starttime.tv_nsec;
                ftime.tv_sec = ftime.tv_sec - time.tv_sec - starttime.tv_sec - 1;
            } else {
                ftime.tv_nsec = ftime.tv_nsec - time.tv_nsec - starttime.tv_nsec;
                ftime.tv_sec = ftime.tv_sec - time.tv_sec - starttime.tv_sec; 
            }

            /* Poll with 0 timeout to check that data hasn't arrived before we're ready yet */
            FD_ZERO(&readfds);
            FD_SET(fd_wc, &readfds);
            notime.tv_sec = 0;  
            notime.tv_usec = 0; 
            if ( !err && ( (countfd=select(fd_wc+1, &readfds, NULL, NULL, &notime)) < 0 )){
                perror("select()");
                err = 1;
            } else if (countfd > 0){
                printf("OVERRUN!\n");
                snprintf(output, 25, ",,,%d.%04d\n\n", ftime.tv_sec, ftime.tv_nsec/100000);
                write(fd_sc, output, strnlen(output, 24)); 
            }

        }

    }


    return 0;

}

我在输出的串行流上看到的时间戳是相当规则的(偏差通常会被下一个循环捕获)。输出片段:

6.1810,0,225.25
6.1867,0,225.25
6.1922,0,225.25
6,2063,0,225.25
,,,0.0010

到这里,到6.1922s一切正常。下一个样本是 6.2063 - 比上一个样本晚 14.1 毫秒,但它没有超时,之前的 6.1922-6.2063 循环也没有通过轮询 select() 捕捉到溢出。我的结论是最后一个循环在采样时间内,select 花了 -10 毫秒太长 return 没有超时。

,0.0010 表示之后循环的循环时间 (ftime) - 我真的应该检查出错时的循环时间。我明天试试。

如果 struct timeval 的值设置为零,则 select 不会阻塞,但如果超时参数是 NULL 指针,它将...

If the timeout argument is not a NULL pointer, it points to an object of type struct timeval that specifies a maximum interval to wait for the selection to complete. If the timeout argument points to an object of type struct timeval whose members are 0, select() does not block. If the timeout argument is a NULL pointer, select() blocks until an event causes one of the masks to be returned with a valid (non-zero) value or until a signal occurs that needs to be delivered. If the time limit expires before any event occurs that would cause one of the masks to be set to a non-zero value, select() completes successfully and returns 0.

阅读更多 here

编辑 以解决评论,并添加新信息:

有几点值得注意。

First - 在评论中,有人建议将 sleep() 添加到您的工作循环中。这是一个很好的建议。 stated here 的原因虽然处理线程入口点,但仍然适用,因为您正在实例化一个连续循环。

Second - Linux select() is a system call with an interesting implemantation history, and as such has a range of varying behaviours from implementation to implementation, some which may contribute to the unexpected behaviours you are seeing. I am not sure which of the major blood lines of Linux Arch Linux comes from, but the man7.org page for select() 包括以下两个部分,根据您的描述似乎是在描述可能导致您正在经历的 延误 的情况。

校验和错误:

Under Linux, select() may report a socket file descriptor as "ready   
for reading", while nevertheless a subsequent read blocks.  This could  
for example happen when data has arrived but upon examination has wrong  
checksum and is discarded.  

竞争条件:(介绍和讨论 pselect())

...Suppose the signal handler sets a global flag and returns.  Then a test  
of this global flag followed by a call of select() could hang indefinitely  
if the signal arrived just after the test but just before the call...   

根据您的观察描述,并根据您的 Linux 版本的实施方式,这些实施 功能 中的任何一个都可能是贡献者。

传递给 select 的超时是一个粗略的下限 - select 允许将您的进程延迟稍多一点。特别是,如果您的进程被不同的进程(上下文切换)或内核中的中断处理抢占,您的进程将被延迟。

以下是 Linux 手册页关于该主题的内容:

Note that the timeout interval will be rounded up to the system clock granularity, and kernel scheduling delays mean that the blocking interval may overrun by a small amount.

这里是 POSIX 标准:

Implementations may also place limitations on the granularity of timeout intervals. If the requested timeout interval requires a finer granularity than the implementation supports, the actual timeout interval shall be rounded up to the next supported value.

在通用系统上很难避免这种情况。通过将进程锁定在内存中 (mlockall) 并将进程设置为 real-time 优先级(使用 sched_setschedulerSCHED_FIFO,并记得经常休眠,让其他进程有机会 运行)。

一种更困难的方法是使用专用于 运行 宁 real-time 代码的 real-time 微控制器。有些人声称 reliably sample at 20MHz on fairly cheap hardware 使用该技术。