使用简单的 CSV 导入 Clickhouse 解析错误

Parse error with a simple CSV import into Clickhouse

我有一个非常简单的 CSV 文件,我试图将其导入 Clickhouse,但没有成功。创建 table 语句是:

CREATE TABLE staging.EloLBK
(
    `Month` DateTime64(3),
    `1958` Int32,
    `1959` Int32,
    `1960` Int32
)
ENGINE = MergeTree
PRIMARY KEY Month
ORDER BY Month
SETTINGS index_granularity = 8192

CSV 数据如下所示:

"Month", "1958", "1959", "1960"
"JAN",  340,  360,  417
"FEB",  318,  342,  391
"MAR",  362,  406,  419
"APR",  348,  396,  461
"MAY",  363,  420,  472
"JUN",  435,  472,  535
"JUL",  491,  548,  622
"AUG",  505,  559,  606
"SEP",  404,  463,  508
"OCT",  359,  407,  461
"NOV",  310,  362,  390
"DEC",  337,  405,  432

我的导入语句是:

INSERT INTO EloLBK SELECT * FROM file('EloLBK/*.csv', 'CSVWithNames', '"Month" datetime64, "1958" integer, "1959" integer, "1960" integer')

从 clickhouse 返回的错误是:

Code: 27. DB::Exception: Cannot parse input: expected '"' before: '417\n"FEB",  318,  342,  391\n"MAR",  362,  406,  419\n"APR",  348,  396,  461\n"MAY",  363,  420,  472\n"JUN",  435,  472,  535\n"JUL",  491,  548,  622\n"AUG",  505,':
Row 1:
Column 0,   name: Month, type: DateTime64(3), parsed text: "<DOUBLE QUOTE>JAN<DOUBLE QUOTE>,  340,  360,  "ERROR
Code: 27. DB::ParsingException: Cannot parse input: expected '"' before: '417\n"FEB",  318,  342,  391\n"MAR",  362,  406,  419\n"APR",  348,  396,  461\n"MAY",  363,  420,  472\n"JUN",  435,  472,  535\n"JUL",  491,  548,  622\n"AUG",  505,'. (CANNOT_PARSE_INPUT_ASSERTION_FAILED) (version 21.12.1.8928 (official build))

: While executing CSVRowInputFormat: While executing File. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)

我不确定如何解决这个问题,所以任何建议都将不胜感激!

好的,经过一番折腾,错误消息似乎有点误导。实际问题是 clickhouse(可以理解)无法将月份解析为日期时间。

以下 CSV 输入工作正常:

"Month","1958","1959","1960"
"1970-01-01T00:00:00","340","360","417"
"1970-01-02T00:00:00","318","342","391"

您可以通过传递选项 date_time_input_format='best_effort' 告诉 ClickHouse 进行最大努力猜测,例如:

INSERT INTO EloLBK SELECT * FROM file('EloLBK/*.csv', 'CSVWithNames', '"Month" datetime64, "1958" integer, "1959" integer, "1960" integer') settings date_time_input_format='best_effort';

将导致:

┌───────────────────Month─┬─1958─┬─1959─┬─1960─┐
│ 2000-01-01 00:00:00.000 │  340 │  360 │  417 │
│ 2000-02-01 00:00:00.000 │  318 │  342 │  391 │
│ 2000-03-01 00:00:00.000 │  362 │  406 │  419 │
│ 2000-04-01 00:00:00.000 │  348 │  396 │  461 │
│ 2000-05-01 00:00:00.000 │  363 │  420 │  472 │
│ 2000-06-01 00:00:00.000 │  435 │  472 │  535 │
│ 2000-07-01 00:00:00.000 │  491 │  548 │  622 │
│ 2000-08-01 00:00:00.000 │  505 │  559 │  606 │
│ 2000-09-01 00:00:00.000 │  404 │  463 │  508 │
│ 2000-10-01 00:00:00.000 │  359 │  407 │  461 │
│ 2000-11-01 00:00:00.000 │  310 │  362 │  390 │
│ 2000-12-01 00:00:00.000 │  337 │  405 │  432 │
└─────────────────────────┴──────┴──────┴──────┘