posix_fadvise() 的建议可以合并吗?
Can advice of posix_fadvise() be combined?
我正在尝试对一大堆文件进行哈希处理,并希望在我的系统上尽可能地饱和 I/O。这个用例使三件事同时为真
- 我只读一次文件
- 我需要整个文件
- 文件将被顺序读取
我能否合并 fadvise()
个建议,或者如果我在同一范围内提出多个建议,一个会覆盖另一个吗?
我正在尝试三个顺序调用,因为政策似乎不能像标志一样进行“或”运算。
os.posix_fadvise(f, 0, 0, os.POSIX_FADV_SEQUENTIAL)
os.posix_fadvise(f, 0, 0, os.POSIX_FADV_WILLNEED)
os.posix_fadvise(f, 0, 0, os.POSIX_FADV_NOREUSE)
但在我只是建议 WILLNEED 之前。从手册页来看,它似乎只是设置了一个预读缓冲区策略,而 WILLNEED 似乎是最明智的,但我确实需要从 HDD 中按顺序获取数据,而且我不打算再次读取它。
此行为是否已定义或仅取决于目标平台的实施者?
实施
根据 fadvise 我发现的这个实现,有一个应用于建议标志的开关。您可以看到 read-ahead 页数 file->f_ra.ra_pages
等属性确实会根据所选标志进行“切换”。但是其他与缓存相关的函数调用不是 (force_page_cache_readahead
).
switch (advice) {
case POSIX_FADV_NORMAL:
file->f_ra.ra_pages = bdi->ra_pages;
spin_lock(&file->f_lock);
file->f_mode &= ~FMODE_RANDOM;
spin_unlock(&file->f_lock);
break;
case POSIX_FADV_RANDOM:
spin_lock(&file->f_lock);
file->f_mode |= FMODE_RANDOM;
spin_unlock(&file->f_lock);
break;
case POSIX_FADV_SEQUENTIAL:
file->f_ra.ra_pages = bdi->ra_pages * 2;
spin_lock(&file->f_lock);
file->f_mode &= ~FMODE_RANDOM;
spin_unlock(&file->f_lock);
break;
case POSIX_FADV_WILLNEED:
/* First and last PARTIAL page! */
start_index = offset >> PAGE_SHIFT;
end_index = endbyte >> PAGE_SHIFT;
/* Careful about overflow on the "+1" */
nrpages = end_index - start_index + 1;
if (!nrpages)
nrpages = ~0UL;
/*
* Ignore return value because fadvise() shall return
* success even if filesystem can't retrieve a hint,
*/
force_page_cache_readahead(mapping, file, start_index, nrpages);
break;
case POSIX_FADV_NOREUSE:
break;
case POSIX_FADV_DONTNEED:
if (!inode_write_congested(mapping->host))
__filemap_fdatawrite_range(mapping, offset, endbyte,
WB_SYNC_NONE);
/*
* First and last FULL page! Partial pages are deliberately
* preserved on the expectation that it is better to preserve
* needed memory than to discard unneeded memory.
*/
start_index = (offset+(PAGE_SIZE-1)) >> PAGE_SHIFT;
end_index = (endbyte >> PAGE_SHIFT);
/*
* The page at end_index will be inclusively discarded according
* by invalidate_mapping_pages(), so subtracting 1 from
* end_index means we will skip the last page. But if endbyte
* is page aligned or is at the end of file, we should not skip
* that page - discarding the last page is safe enough.
*/
if ((endbyte & ~PAGE_MASK) != ~PAGE_MASK &&
endbyte != inode->i_size - 1) {
/* First page is tricky as 0 - 1 = -1, but pgoff_t
* is unsigned, so the end_index >= start_index
* check below would be true and we'll discard the whole
* file cache which is not what was asked.
*/
if (end_index == 0)
break;
end_index--;
}
if (end_index >= start_index) {
unsigned long count;
/*
* It's common to FADV_DONTNEED right after
* the read or write that instantiates the
* pages, in which case there will be some
* sitting on the local LRU cache. Try to
* avoid the expensive remote drain and the
* second cache tree walk below by flushing
* them out right away.
*/
lru_add_drain();
count = invalidate_mapping_pages(mapping,
start_index, end_index);
/*
* If fewer pages were invalidated than expected then
* it is possible that some of the pages were on
* a per-cpu pagevec for a remote CPU. Drain all
* pagevecs and try again.
*/
if (count < (end_index - start_index + 1)) {
lru_add_drain_all();
invalidate_mapping_pages(mapping, start_index,
end_index);
}
}
break;
default:
return -EINVAL;
}
结论
根据系统的不同,实现可能会略有不同(如果您不使用 Linux),因为似乎 POSIX fadvise 对不同标志组合的规则并不十分清楚。但似乎有可能某些属性组合在一起,而另一些则没有。希望有经验的大神指点一下。
我正在尝试对一大堆文件进行哈希处理,并希望在我的系统上尽可能地饱和 I/O。这个用例使三件事同时为真
- 我只读一次文件
- 我需要整个文件
- 文件将被顺序读取
我能否合并 fadvise()
个建议,或者如果我在同一范围内提出多个建议,一个会覆盖另一个吗?
我正在尝试三个顺序调用,因为政策似乎不能像标志一样进行“或”运算。
os.posix_fadvise(f, 0, 0, os.POSIX_FADV_SEQUENTIAL)
os.posix_fadvise(f, 0, 0, os.POSIX_FADV_WILLNEED)
os.posix_fadvise(f, 0, 0, os.POSIX_FADV_NOREUSE)
但在我只是建议 WILLNEED 之前。从手册页来看,它似乎只是设置了一个预读缓冲区策略,而 WILLNEED 似乎是最明智的,但我确实需要从 HDD 中按顺序获取数据,而且我不打算再次读取它。
此行为是否已定义或仅取决于目标平台的实施者?
实施
根据 fadvise 我发现的这个实现,有一个应用于建议标志的开关。您可以看到 read-ahead 页数 file->f_ra.ra_pages
等属性确实会根据所选标志进行“切换”。但是其他与缓存相关的函数调用不是 (force_page_cache_readahead
).
switch (advice) {
case POSIX_FADV_NORMAL:
file->f_ra.ra_pages = bdi->ra_pages;
spin_lock(&file->f_lock);
file->f_mode &= ~FMODE_RANDOM;
spin_unlock(&file->f_lock);
break;
case POSIX_FADV_RANDOM:
spin_lock(&file->f_lock);
file->f_mode |= FMODE_RANDOM;
spin_unlock(&file->f_lock);
break;
case POSIX_FADV_SEQUENTIAL:
file->f_ra.ra_pages = bdi->ra_pages * 2;
spin_lock(&file->f_lock);
file->f_mode &= ~FMODE_RANDOM;
spin_unlock(&file->f_lock);
break;
case POSIX_FADV_WILLNEED:
/* First and last PARTIAL page! */
start_index = offset >> PAGE_SHIFT;
end_index = endbyte >> PAGE_SHIFT;
/* Careful about overflow on the "+1" */
nrpages = end_index - start_index + 1;
if (!nrpages)
nrpages = ~0UL;
/*
* Ignore return value because fadvise() shall return
* success even if filesystem can't retrieve a hint,
*/
force_page_cache_readahead(mapping, file, start_index, nrpages);
break;
case POSIX_FADV_NOREUSE:
break;
case POSIX_FADV_DONTNEED:
if (!inode_write_congested(mapping->host))
__filemap_fdatawrite_range(mapping, offset, endbyte,
WB_SYNC_NONE);
/*
* First and last FULL page! Partial pages are deliberately
* preserved on the expectation that it is better to preserve
* needed memory than to discard unneeded memory.
*/
start_index = (offset+(PAGE_SIZE-1)) >> PAGE_SHIFT;
end_index = (endbyte >> PAGE_SHIFT);
/*
* The page at end_index will be inclusively discarded according
* by invalidate_mapping_pages(), so subtracting 1 from
* end_index means we will skip the last page. But if endbyte
* is page aligned or is at the end of file, we should not skip
* that page - discarding the last page is safe enough.
*/
if ((endbyte & ~PAGE_MASK) != ~PAGE_MASK &&
endbyte != inode->i_size - 1) {
/* First page is tricky as 0 - 1 = -1, but pgoff_t
* is unsigned, so the end_index >= start_index
* check below would be true and we'll discard the whole
* file cache which is not what was asked.
*/
if (end_index == 0)
break;
end_index--;
}
if (end_index >= start_index) {
unsigned long count;
/*
* It's common to FADV_DONTNEED right after
* the read or write that instantiates the
* pages, in which case there will be some
* sitting on the local LRU cache. Try to
* avoid the expensive remote drain and the
* second cache tree walk below by flushing
* them out right away.
*/
lru_add_drain();
count = invalidate_mapping_pages(mapping,
start_index, end_index);
/*
* If fewer pages were invalidated than expected then
* it is possible that some of the pages were on
* a per-cpu pagevec for a remote CPU. Drain all
* pagevecs and try again.
*/
if (count < (end_index - start_index + 1)) {
lru_add_drain_all();
invalidate_mapping_pages(mapping, start_index,
end_index);
}
}
break;
default:
return -EINVAL;
}
结论
根据系统的不同,实现可能会略有不同(如果您不使用 Linux),因为似乎 POSIX fadvise 对不同标志组合的规则并不十分清楚。但似乎有可能某些属性组合在一起,而另一些则没有。希望有经验的大神指点一下。