CiickHouse 插入唯一数据

Question

我想插入不重复的数据。这些字段的三元组应该是唯一的 ticker, kline_type, dateTime

发现我应该使用 ReplacingMergeTree https://clickhouse.com/docs/ru/engines/table-engines/mergetree-family/replacingmergetree/

我正在尝试

import clickhouse_driver

def prepare_table():
    client = clickhouse_driver.Client.from_url(f'clickhouse://default:{os.getenv("CLICK_PASSWORD")}@localhost:9000/crypto_exchange')
    
    # field names from binance API
    client.execute('''
CREATE TABLE IF NOT EXISTS historical_data_binance
(
    dateTime DateTime,
    closeTime Int64,
    open Float64,
    high Float64,
    low Float64,
    close Float64,
    volume Float64,
    kline_type String,
    ticker String
) ENGINE = ReplacingMergeTree
ORDER BY (ticker, kline_type, dateTime)
''')
    return client

prepare_table()

但我认为我的解决方案不起作用，因为我看到了重复项：

  2021-11-04 11:00:00 │ 1636027199999 │ 61894.82 │  62188.78 │ 60866.46 │ 61444.74 │ 20.158382 │ 1h         │ BTCUSDT │
│ 2021-11-04 12:00:00 │ 1636030799999 │ 61420.86 │  61698.74 │ 58754.41 │ 61621.01 │ 15.721483 │ 1h         │ BTCUSDT │
└─────────────────────┴───────────────┴──────────┴───────────┴──────────┴──────────┴───────────┴────────────┴─────────┘
┌────────────dateTime─┬─────closeTime─┬─────open─┬─────high─┬──────low─┬────close─┬────volume─┬─kline_type─┬─ticker──┐
│ 2021-11-04 11:00:00 │ 1636027199999 │ 61894.82 │ 62188.78 │ 60866.46 │ 61444.74 │ 20.158382 │ 1h         │ BTCUSDT

插入数据的正确方法是什么？

Answer 1

ReplacingMergeTree 不保证不存在重复项。您需要以某种方式在选择中即时进行最终的重复数据删除。
重复数据删除是合并的副产品。

create table testD ( Key Int64, ver UInt64, Value String) 
Engine=ReplacingMergeTree(ver) order by Key;

insert into testD values (1, 1, '1');
insert into testD values (1, 2, '2');

SELECT * FROM testD
┌─Key─┬─ver─┬─Value─┐
│   1 │   2 │ 2     │
└─────┴─────┴───────┘
┌─Key─┬─ver─┬─Value─┐
│   1 │   1 │ 1     │
└─────┴─────┴───────┘

1)
a) SELECT * FROM testD final
┌─Key─┬─ver─┬─Value─┐
│   1 │   2 │ 2     │
└─────┴─────┴───────┘

b) SELECT key, argMax(Value, ts) FROM testD group by key;
┌─Key─┬─argMax(Value, ver)─┐
│   1 │ 2                  │
└─────┴────────────────────┘

c) SELECT Key, Value FROM testD order by Key, Value desc limit 1 by Key;
┌─Key─┬─Value─┐
│   1 │ 2     │
└─────┴───────┘

2)

optimize table testD final; -- initiate unplanned merge

SELECT * FROM testD;
┌─Key─┬─ver─┬─Value─┐
│   1 │   2 │ 2     │
└─────┴─────┴───────┘

CiickHouse 插入唯一数据

CiickHouse Insert unique data

python

clickhouse