ClickHouse - 链接太多

ClickHouse - Too many links

我正在测试具有大量插入的 ClickHouse 服务器,并且遇到服务器处于停止处理插入并出现“链接过多”异常的状态。根据观察,我认为即使我停止插入它也无法从状态中恢复。我还注意到“链接太多”异常消息每毫秒出现一次,导致服务器日志文件很快填满。

测试环境。 & 如何重现:

在此状态下,clickhouse-server 使用 1.5 核和 w/o 明显的文件 I/O 活动。 其他查询有效。 为了从状态中恢复,我删除了临时目录。

我不认为我们通常会在实践中以这种方式插入(忽略“太多部分”)但是想知道这(进入这种状态)是否会成为一个问题。还有,除了不异常插入数据,还有什么办法可以避免吗?

提前致谢。

日志:

- client 
  Code: 252. DB::Exception: Received from xx:9000. DB::Exception: Too many parts (303). Merges are processing significantly slower than inserts..

- server: 
  2021.10.21 09:17:48.649609 [ 21223 ] {} <Error> auto DB::IBackgroundJobExecutor::jobExecutingTask()::(anonymous class)::operator()() const: Poco::Exception. Code: 1000, e.code() = 31, e.displayText() = File access error: Too many links: /var/lib/clickhouse/tmp/store/48c/48cab972-1221-4222-a5f4-ed3960a08f35/tmp_merge_20211021_452585_452597_1, Stack trace (when copying this message, always include the lines below):

0. Poco::FileImpl::handleLastErrorImpl(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x11c42124 in /usr/bin/clickhouse
1. Poco::FileImpl::createDirectoryImpl() @ 0x11c4372f in /usr/bin/clickhouse
2. Poco::File::createDirectories() @ 0x11c456b7 in /usr/bin/clickhouse
3. DB::DiskLocal::createDirectories(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xe79e358 in /usr/bin/clickhouse
4. DB::MergeTreeDataMergerMutator::mergePartsToTemporaryPart(DB::FutureMergedMutatedPart const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::BackgroundProcessListEntry<DB::MergeListElement, DB::MergeInfo>&, std::__1::shared_ptr<DB::RWLockImpl::LockHolderImpl>&, long, DB::Context const&, std::__1::unique_ptr<DB::IReservation, std::__1::default_delete<DB::IReservation> > const&, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) @ 0xf36ad8e in /usr/bin/clickhouse
5. DB::StorageMergeTree::mergeSelectedParts(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::StorageMergeTree::MergeMutateSelectedEntry&, std::__1::shared_ptr<DB::RWLockImpl::LockHolderImpl>&) @ 0xf10f108 in /usr/bin/clickhouse
6. ? @ 0xf12168c in /usr/bin/clickhouse
7. ? @ 0xf2cb076 in /usr/bin/clickhouse
8. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x8513fb8 in /usr/bin/clickhouse
9. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'()::operator()() @ 0x8515f6f in /usr/bin/clickhouse
10. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x851158f in /usr/bin/clickhouse
11. ? @ 0x8515023 in /usr/bin/clickhouse
12. ? @ 0x7eb5 in /usr/lib64/libpthread-2.17.so
13. __clone @ 0xfe8fd in /usr/lib64/libc-2.17.so
(version 21.2.2.8 (official build))enter code here

--- with 21.8.
2021.10.25 08:29:18.354200 [ 55326 ] {} <Error> auto DB::IBackgroundJobExecutor::execute(DB::JobAndPool)::(anonymous class)::operator()() const: std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in create_directory: Too many links [/var/lib/clickhouse/tmp/store/48c/48cab972-1221-4222-a5f4-ed3960a08f35/tmp_merge_20211024_906198_906236_1], Stack trace (when copying this message, always include the lines below):

0. std::__1::system_error::system_error(std::__1::error_code, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x1590de6f in ?
1. ? @ 0x158a171f in ?
2. ? @ 0x158a1136 in ?
3. ? @ 0x158a58f8 in ?
4. std::__1::__fs::filesystem::__create_directory(std::__1::__fs::filesystem::path const&, std::__1::error_code*) @ 0x158a646b in ?
5. std::__1::__fs::filesystem::__create_directories(std::__1::__fs::filesystem::path const&, std::__1::error_code*) @ 0x158a6125 in ?
6. std::__1::__fs::filesystem::__create_directories(std::__1::__fs::filesystem::path const&, std::__1::error_code*) @ 0x158a6189 in ?
7. DB::DiskLocal::createDirectories(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xff032ec in /usr/bin/clickhouse
8. DB::MergeTreeDataMergerMutator::mergePartsToTemporaryPart(DB::FutureMergedMutatedPart const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::BackgroundProcessListEntry<DB::MergeListElement, DB::MergeInfo>&, std::__1::shared_ptr<DB::RWLockImpl::LockHolderImpl>&, long, std::__1::shared_ptr<DB::Context const>, std::__1::unique_ptr<DB::IReservation, std::__1::default_delete<DB::IReservation> > const&, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::MergeTreeData::MergingParams const&, DB::IMergeTreeDataPart const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x10d14ff8 in /usr/bin/clickhouse
 9. DB::StorageMergeTree::mergeSelectedParts(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::StorageMergeTree::MergeMutateSelectedEntry&, std::__1::shared_ptr<DB::RWLockImpl::LockHolderImpl>&) @ 0x10eea024 in /usr/bin/clickhouse
10. ? @ 0x10ef9937 in /usr/bin/clickhouse
11. ? @ 0x10c40e77 in /usr/bin/clickhouse
12. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x8ffab98 in /usr/bin/clickhouse
13. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x8ffc73f in /usr/bin/clickhouse
14. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x8ff84ff in /usr/bin/clickhouse
15. ? @ 0x8ffb763 in /usr/bin/clickhouse
16. ? @ 0x7eb5 in /usr/lib64/libpthread-2.17.so
17. __clone @ 0xfe8fd in /usr/lib64/libc-2.17.so

Cannot print extra info for Poco::Exception (version 21.8.5.1.altinity+prestable (altinity build))

df -i /var/lib/clickhouse/

df -h /var/lib/clickhouse/

  1. 将 CH 升级到 21.8.10.19 https://github.com/ClickHouse/ClickHouse/issues/26471

  2. https://github.com/ClickHouse/ClickHouse/issues/3174#issuecomment-423435071

  3. https://clickhouse.com/docs/en/operations/settings/merge-tree-settings/#parts-to-throw-insert

# cat /etc/clickhouse-server/config.d/z_parts_to_throw.xml
<yandex>
    <merge_tree>
        <old_parts_lifetime>30</old_parts_lifetime>
        <parts_to_delay_insert>150</parts_to_delay_insert>
        <parts_to_throw_insert>900</parts_to_throw_insert>
        <max_delay_to_insert>5</max_delay_to_insert>
    </merge_tree>
</yandex>
  1. https://clickhouse.com/docs/en/operations/settings/settings/#background_pool_size
# cat /etc/clickhouse-server/users.d/user_substitutes.xml
<?xml version="1.0"?>
<yandex>
    <profiles>
        <default>
            <background_pool_size>32</background_pool_size>
        </default>
    </profiles>
</yandex>
  1. 重启频道