posix_fadvise() 的建议可以合并吗?

Can advice of posix_fadvise() be combined?

我正在尝试对一大堆文件进行哈希处理,并希望在我的系统上尽可能地饱和 I/O。这个用例使三件事同时为真

我能否合并 fadvise() 个建议,或者如果我在同一范围内提出多个建议,一个会覆盖另一个吗?

我正在尝试三个顺序调用,因为政策似乎不能像标志一样进行“或”运算。

os.posix_fadvise(f, 0, 0, os.POSIX_FADV_SEQUENTIAL)
os.posix_fadvise(f, 0, 0, os.POSIX_FADV_WILLNEED)
os.posix_fadvise(f, 0, 0, os.POSIX_FADV_NOREUSE)

但在我只是建议 WILLNEED 之前。从手册页来看,它似乎只是设置了一个预读缓冲区策略,而 WILLNEED 似乎是最明智的,但我确实需要从 HDD 中按顺序获取数据,而且我不打算再次读取它。

此行为是否已定义或仅取决于目标平台的实施者?

实施

根据 fadvise 我发现的这个实现,有一个应用于建议标志的开关。您可以看到 read-ahead 页数 file->f_ra.ra_pages 等属性确实会根据所选标志进行“切换”。但是其他与缓存相关的函数调用不是 (force_page_cache_readahead).

switch (advice) {
    case POSIX_FADV_NORMAL:
        file->f_ra.ra_pages = bdi->ra_pages;
        spin_lock(&file->f_lock);
        file->f_mode &= ~FMODE_RANDOM;
        spin_unlock(&file->f_lock);
        break;
    case POSIX_FADV_RANDOM:
        spin_lock(&file->f_lock);
        file->f_mode |= FMODE_RANDOM;
        spin_unlock(&file->f_lock);
        break;
    case POSIX_FADV_SEQUENTIAL:
        file->f_ra.ra_pages = bdi->ra_pages * 2;
        spin_lock(&file->f_lock);
        file->f_mode &= ~FMODE_RANDOM;
        spin_unlock(&file->f_lock);
        break;
    case POSIX_FADV_WILLNEED:
        /* First and last PARTIAL page! */
        start_index = offset >> PAGE_SHIFT;
        end_index = endbyte >> PAGE_SHIFT;
        /* Careful about overflow on the "+1" */
        nrpages = end_index - start_index + 1;
        if (!nrpages)
            nrpages = ~0UL;
        /*
         * Ignore return value because fadvise() shall return
         * success even if filesystem can't retrieve a hint,
         */
        force_page_cache_readahead(mapping, file, start_index, nrpages);
        break;
    case POSIX_FADV_NOREUSE:
        break;
    case POSIX_FADV_DONTNEED:
        if (!inode_write_congested(mapping->host))
            __filemap_fdatawrite_range(mapping, offset, endbyte,
                           WB_SYNC_NONE);
        /*
         * First and last FULL page! Partial pages are deliberately
         * preserved on the expectation that it is better to preserve
         * needed memory than to discard unneeded memory.
         */
        start_index = (offset+(PAGE_SIZE-1)) >> PAGE_SHIFT;
        end_index = (endbyte >> PAGE_SHIFT);
        /*
         * The page at end_index will be inclusively discarded according
         * by invalidate_mapping_pages(), so subtracting 1 from
         * end_index means we will skip the last page.  But if endbyte
         * is page aligned or is at the end of file, we should not skip
         * that page - discarding the last page is safe enough.
         */
        if ((endbyte & ~PAGE_MASK) != ~PAGE_MASK &&
                endbyte != inode->i_size - 1) {
            /* First page is tricky as 0 - 1 = -1, but pgoff_t
             * is unsigned, so the end_index >= start_index
             * check below would be true and we'll discard the whole
             * file cache which is not what was asked.
             */
            if (end_index == 0)
                break;
            end_index--;
        }
        if (end_index >= start_index) {
            unsigned long count;
            /*
             * It's common to FADV_DONTNEED right after
             * the read or write that instantiates the
             * pages, in which case there will be some
             * sitting on the local LRU cache. Try to
             * avoid the expensive remote drain and the
             * second cache tree walk below by flushing
             * them out right away.
             */
            lru_add_drain();
            count = invalidate_mapping_pages(mapping,
                        start_index, end_index);
            /*
             * If fewer pages were invalidated than expected then
             * it is possible that some of the pages were on
             * a per-cpu pagevec for a remote CPU. Drain all
             * pagevecs and try again.
             */
            if (count < (end_index - start_index + 1)) {
                lru_add_drain_all();
                invalidate_mapping_pages(mapping, start_index,
                        end_index);
            }
        }
        break;
    default:
        return -EINVAL;
    }

结论

根据系统的不同,实现可能会略有不同(如果您不使用 Linux),因为似乎 POSIX fadvise 对不同标志组合的规则并不十分清楚。但似乎有可能某些属性组合在一起,而另一些则没有。希望有经验的大神指点一下。