在 Clickhouse 中上传大型 csv 时,连接被对等方重置

Connection reset by peer while uploading large csv in clickhouse

将行数 > 2.5M 且列数 > 90 的 csv 上传到 Clickhouse 时出现此错误。

code: 210. DB::NetException: Connection reset by peer, while writing to socket (10.107.146.25:9000): data for INSERT was parsed from stdin

这是 table

的创建 table 语句
CREATE TABLE table_names
(

    {column_names and types}
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS index_granularity = 8192,
allow_nullable_key = 1

这是我 运行 用于插入 csv

的命令
cat {filepath}.csv | sudo docker run -i --rm yandex/clickhouse-client -m --host {host}  -u {user} --input_format_allow_errors_num=10 --input_format_allow_errors_ratio=0.1  --max_memory_usage=15000000000 --format_csv_allow_single_quotes 0 --input_format_skip_unknown_fields 1 --query='INSERT INTO table_name FORMAT CSVWithNames'

这是query_logs系统table在clickhouse

中记录的错误
Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 65735. Bytes expected: 134377. (version 21.6.5.37 (official build))

这是堆栈跟踪(再次来自 query_log table)

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8b6cbba in /usr/bin/clickhouse
1. DB::ReadBuffer::readStrict(char*, unsigned long) @ 0x8ba7c4d in /usr/bin/clickhouse
2. DB::CompressedReadBufferBase::readCompressedData(unsigned long&, unsigned long&, bool) @ 0xf2347bc in /usr/bin/clickhouse
3. DB::CompressedReadBuffer::nextImpl() @ 0xf233f27 in /usr/bin/clickhouse
4. void DB::readVarUIntImpl<false>(unsigned long&, DB::ReadBuffer&) @ 0x8ba7eac in /usr/bin/clickhouse
5. ? @ 0xf40843b in /usr/bin/clickhouse
6. DB::SerializationString::deserializeBinaryBulk(DB::IColumn&, DB::ReadBuffer&, unsigned long, double) const @ 0xf40723b in /usr/bin/clickhouse
7. DB::ISerialization::deserializeBinaryBulkWithMultipleStreams(COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, DB::ISerialization::DeserializeBinaryBulkSettings&, std::__1::shared_ptr<DB::ISerialization::DeserializeBinaryBulkState>&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, COW<DB::IColumn>::immutable_ptr<DB::IColumn> > > >*) const @ 0xf3d4dd5 in /usr/bin/clickhouse
8. DB::SerializationNullable::deserializeBinaryBulkWithMultipleStreams(COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, DB::ISerialization::DeserializeBinaryBulkSettings&, std::__1::shared_ptr<DB::ISerialization::DeserializeBinaryBulkState>&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, COW<DB::IColumn>::immutable_ptr<DB::IColumn> > > >*) const @ 0xf3f550f in /usr/bin/clickhouse
9. DB::NativeBlockInputStream::readData(DB::IDataType const&, COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, DB::ReadBuffer&, unsigned long, double) @ 0xfa9f8f5 in /usr/bin/clickhouse
10. DB::NativeBlockInputStream::readImpl() @ 0xfaa07b3 in /usr/bin/clickhouse
11. DB::IBlockInputStream::read() @ 0xf30f452 in /usr/bin/clickhouse
12. DB::TCPHandler::receiveData(bool) @ 0x104403c4 in /usr/bin/clickhouse
13. DB::TCPHandler::receivePacket() @ 0x10435bec in /usr/bin/clickhouse
14. DB::TCPHandler::readDataNext(unsigned long, long) @ 0x10437e5f in /usr/bin/clickhouse
15. DB::TCPHandler::processInsertQuery(DB::Settings const&) @ 0x1043625e in /usr/bin/clickhouse
16. DB::TCPHandler::runImpl() @ 0x1042eb09 in /usr/bin/clickhouse
17. DB::TCPHandler::run() @ 0x10441839 in /usr/bin/clickhouse
18. Poco::Net::TCPServerConnection::start() @ 0x12a3fd4f in /usr/bin/clickhouse
19. Poco::Net::TCPServerDispatcher::run() @ 0x12a417da in /usr/bin/clickhouse
20. Poco::PooledThread::run() @ 0x12b7ab39 in /usr/bin/clickhouse
21. Poco::ThreadImpl::runnableEntry(void*) @ 0x12b76b2a in /usr/bin/clickhouse
22. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
23. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so

我最初虽然这是由于分区和顺序键,但我删除了所有内容,仍然出现同样的问题,行数相差超过 100 万行。

错误 DB::Exception: Cannot read all data. 表示您尝试插入的数据已损坏。很可能,您的 250 万行中的某些行没有所有字段,或者某些行的内容类型不匹配。

我建议尝试以较小的批次插入,这样您就可以找出数据的问题所在。因此,您可以获得一些成功的批次,直到找到损坏的行。