Pentaho table 输入在 postgres tables 上的性能非常低，即使是 table 中的两列

Question

从 postgres 读取的简单源代码 table（从 20 列中获取 3 列）需要花费大量时间来阅读，我想阅读流式查找，在那里我获取一列信息

这是日志：

2020/05/15 07:56:03 - load_identifications - Step **Srclkp_Individuals.0** ended successfully, processed 4869591 lines. ( 7632 lines/s)
2020/05/15 07:56:03 - load_identifications - Step LookupIndiv.0 ended successfully, processed 9754378 lines. ( 15288 lines/s)

table输入查询是：

SELECT
    id as INDIVIDUAL_ID,
    org_ext_loc 
FROM
    individuals

这个 table 在 postgres 中几乎没有 20 列和大约 480 万行..

这适用于下面的 pentaho 7.1 数据集成和服务器详细信息

**Our server information**:
OS : Oracle Linux 7.3
RAM : 65707 MB
HDD Capacity : 2 Terabytes
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
CPU(s):                16
CPU MHz:               2294.614

我正在使用 jdbc

连接到 postgres

不知道我还能做些什么来获得大约 15K rows/sec 吞吐量

Answer 1

检查杂项下的转换属性行集中的行数反馈大小

同时检查您的 Table 是否有正确的索引。

Answer 2

当您使用 table 输入和流查找时，pentaho 运行流查找的方式比您使用数据库查找时要慢。正如@nsousa 所建议的那样，我用虚拟步骤检查了这一点，并了解到 pentaho 的处理方式对于每种类型的步骤都是不同的

尽管数据库查找和流查找属于同一类别，但在这种情况下数据库查找的性能更好..

Pentaho 帮助给出了一些想法/建议

Pentaho table 输入在 postgres tables 上的性能非常低，即使是 table 中的两列

Pentaho table input giving very less performance on postgres tables even for two columns in a table

postgresql

pentaho

throughput