Ignite 仅返回长运行查询的预期输出的一个子集

Question

OS: Ubuntu 18.04
Apache Ignite：2.9.0（最新）

已加载数据：（来自 Kaggle 的 Olist 巴西电子商务数据集）
2 table秒：
订单：10 万条记录
order_payments：10 万条记录

索引列：
order_payments: CREATE INDEX idx_order_payments ON order_payments (id, order_id, payment_type)
订单：CREATE INDEX idx_orders ON orders (order_id,customer_id,order_status,order_purchase_timestamp)

在 Ignite Off-heap 和 Persistence 中总共占用了 ~400 MB space。

我正在执行一个简单的 SQL 查询：

    SELECT orders.order_status, 
           order_payments.payment_type, 
           SUM(order_payments.payment_value) AS total_payments
    FROM order_payments
    JOIN orders ON orders.order_id = order_payments.order_id
    GROUP BY orders.order_status, order_payments.payment_type
    ORDER BY total_payments DESC

我在 docker 容器中运行 Apache Ignite。

这是缓存模板配置：

<property name="cacheConfiguration">
    <list>
        <bean abstract="true" class="org.apache.ignite.configuration.CacheConfiguration"
              id="cache-template-bean">
            <!-- when you create a template via XML configuration, you must add an asterisk to
            the name of the template -->
            <property name="name" value="tbl_pll*"/>
            <property name="cacheMode" value="PARTITIONED"/>
            <property name="backups" value="1"/>
            <property name="queryParallelism" value="4"/>
            <!-- Other cache parameters -->
        </bean>
        <bean abstract="true" class="org.apache.ignite.configuration.CacheConfiguration"
              id="cache-template-bean">
            <!-- when you create a template via XML configuration, you must add an asterisk to
            the name of the template -->
            <property name="name" value="tbl_hf_pll*"/>
            <property name="cacheMode" value="PARTITIONED"/>
            <property name="backups" value="1"/>
            <property name="queryParallelism" value="2"/>
            <!-- Other cache parameters -->
        </bean>
    </list>
</property>

当我使用 tbl_pll 缓存模板 table 秒时，结果集（用于查询）大约为 (1 / queryParallelism) * table 中的值数。因此，在 tbl_pll 的情况下，它 returns 大约是预期输出的 1/4。

我对 queryParallelism=2 进行了同样的尝试，这给了我大约 1/2 的输出。

我也尝试不使用任何缓存模板，因此使用 queryParallelism 参数的默认值，即 1，这返回了完整的结果。

预期完整输出（以及 queryParallelism=1 时的输出）：

[['delivered', 'credit_card', 12101094.87999937]
 ['delivered', 'boleto', 2769932.57999998]
 ['delivered', 'voucher', 343013.19]
 ['delivered', 'debit_card', 208421.12]]

与queryParallelism=4:

[['delivered', 'credit_card', 4064387.2800000096], 
 ['delivered', 'boleto', 918272.54], 
 ['delivered', 'voucher', 110648.45000000004], 
 ['delivered', 'debit_card', 64584.53000000001]]

与queryParallelism=2:

[['delivered', 'credit_card', 6129872.129999977], 
 ['delivered', 'boleto', 1360427.3799999985], 
 ['delivered', 'voucher', 168392.55999999976], 
 ['delivered', 'debit_card', 107637.38999999996]]

我怀疑的是： queryparallelism 使用分段索引，输出基于 last/first 索引段中的内容。 reduce 工作不正常并且输出没有从所有线程合并，或者 Ignite 只是运行一个线程并在 reduce 之后返回输出。

由于我已将 order_payments 的 payment_type 列添加到索引中，输出似乎几乎完全除以 threads/index 段数。

我做错了什么，我该如何解决？

编辑：我运行只有 1 个 Apache Ignite 实例。

另外，EXPLAIN关键字的输出：

SELECT
    __Z1.ORDER_STATUS AS __C0_0,
    __Z0.PAYMENT_TYPE AS __C0_1,
    SUM(__Z0.PAYMENT_VALUE) AS __C0_2
FROM PUBLIC.ORDERS __Z1
    /* PUBLIC.ORDERS.__SCAN_ */
INNER JOIN PUBLIC.ORDER_PAYMENTS __Z0
    /* PUBLIC.IDX_ORDER_PAYMENTS: ORDER_ID = __Z1.ORDER_ID */
    ON 1=1
WHERE __Z1.ORDER_ID = __Z0.ORDER_ID
GROUP BY __Z1.ORDER_STATUS, __Z0.PAYMENT_TYPE'], ['SELECT
    __C0_0 AS ORDER_STATUS,
    __C0_1 AS PAYMENT_TYPE,
    CAST(CAST(SUM(__C0_2) AS DOUBLE) AS DOUBLE) AS TOTAL_PAYMENTS
FROM PUBLIC.__T0
    /* PUBLIC."merge_scan" */
GROUP BY __C0_0, __C0_1
ORDER BY 3 DESC

Answer 1

确保表格 co-located. Use the affinityKey parameter of the CREATE TABLE command 以将数据分组在一起。

此外，检查 Ignite SQL 引擎 selects the best index。通常，一旦您将 affinityKey 设置为指向 order_id 列，那么在连接期间需要选择 order_id 索引。

Ignite 仅返回长运行查询的预期输出的一个子集

Ignite is returning only a subset of the expected output of a long running query

h2

python-3.x

ignite

Ignite 仅返回长 运行 查询的预期输出的一个子集

Ignite is returning only a subset of the expected output of a long running query

h2

python-3.x

ignite

Ignite 仅返回长运行查询的预期输出的一个子集