将数据加载到新 CitusDB 实例的最快方法是什么？

Question

我正在按照 Scaling Out Data Ingestion 上的说明使用以下命令：

find . -type f | xargs -n 1 -P 320 sh -c 'echo [=10=] `copy_to_distributed_table -C [=10=] table_name`'

我的集群有一个 master 和八个 worker，每个有两个 SSD。 table 分布在 320 个分片中。

数据加载需要很长时间。平均插入率似乎是每分钟 750k 左右。这是正常现象还是有办法加快速度？

我唯一能想到的是我启用了复制。是否应该关闭加载然后重置？

Answer 1

我假设您想使用散列分区。如果是这种情况，我们将弃用 copy_to_distributed_table 以支持分布式 COPY。 COPY 提供原生 PostgreSQL 体验，解决了几个已知问题，并且 将摄取性能提高了一个数量级以上 。这从 Citus 5.1 开始可用，was released this month and is available in the official PostgreSQL Linux package repositories (PGDG)。

将数据加载到新 CitusDB 实例的最快方法是什么？

What is the fastest way to load data into a new CitusDB instance?

postgresql

citus