jdbcItemReader 支持分区吗?
jdbcItemReader support for partitioning?
使用 https://github.com/jberet/jberet-support 中的 jdbcItemReader 并希望使用分区来加速处理。
分区数(此处为 16)对写入的重复数据有副作用。每个分区对相同的数据执行相同的工作,而不是将输入数据集拆分为 n 个不同的分区。
已编辑:下面的代码显示了实现它的正确方法。
注意:您的 SQL 查询需要返回有序数据:N 个分区,此处为 16,表示 16 reader 将 运行 查询!
<reader ref="jdbcItemReader">
<properties>
<property name="beanType" value="java.util.Map"/>
<property name="sql" value="select ......"/>
<property name="url"
value="jdbc:oracle:thin:......"/>
<property name="user" value="......."/>
<property name="password" value="....."/>
<property name="columnMapping" value="xxxx, xxxx"/>
<property name="columnTypes" value="String,String"/>
<property name="start" value="#{partitionPlan['partition.start']}"/>
<property name="end" value="#{partitionPlan['partition.end']}"/>
<!--CONCUR_READ_ONLY: If you set this as a value of the concurrency while creating the ResultSet object you cannot update the contents of the ResultSet you can only read/retrieve them.-->
<!--CONCUR_UPDATABLE: If you set this as a value of the concurrency while creating the ResultSet object you can update the contents of the ResultSet.-->
<!--TYPE_SCROLL_SENSITIVE: ResultSet is sensitive to the changes that are made in the database i.e. the modifications done in the database are reflected in the ResultSet.-->
<property name="resultSetProperties"
value="fetchSize=5500, resultSetConcurrency=CONCUR_READ_ONLY,
fetchDirection=FETCH_REVERSE,
resultSetType=TYPE_SCROLL_SENSITIVE,
resultSetHoldability=HOLD_CURSORS_OVER_COMMIT"/>
</properties>
</reader>
<processor ref="myProcessort"/>
<writer ref="myWriter"/>
</chunk>
<!-- run your sql with a count to define partitions evenly -->
<partition>
<plan partitions="16" threads="16">
<properties partition="0">
<property name="partition.start" value="0"/>
<property name="partition.end" value="500"/>
</properties>
<properties partition="1">
<property name="partition.start" value="500"/>
<property name="partition.end" value="1000"/>
</properties>
<!-- ... -->
<properties partition="15">
<property name="partition.start" value="5000"/>
<property name="partition.end" value="5500"/>
</properties>
您需要像在第二个 XML 片段中那样定义步骤分区。然后在 jdbcItemReader
中定义 start
和 end
属性,这 2 个属性分别引用分区属性 partition.start
和 partition.end
。
2 个分区属性可以不同命名,只要它们在 partition
和 item-reader
元素中保持一致即可。
例如,
<reader ref="jdbcItemReader">
<properties>
<property name="start" value="#{partitionPlan['partition.start']}"/>
<property name="end" value="#{partitionPlan['partition.end']}"/>
</properties>
</reader>
使用 https://github.com/jberet/jberet-support 中的 jdbcItemReader 并希望使用分区来加速处理。
分区数(此处为 16)对写入的重复数据有副作用。每个分区对相同的数据执行相同的工作,而不是将输入数据集拆分为 n 个不同的分区。
已编辑:下面的代码显示了实现它的正确方法。 注意:您的 SQL 查询需要返回有序数据:N 个分区,此处为 16,表示 16 reader 将 运行 查询!
<reader ref="jdbcItemReader">
<properties>
<property name="beanType" value="java.util.Map"/>
<property name="sql" value="select ......"/>
<property name="url"
value="jdbc:oracle:thin:......"/>
<property name="user" value="......."/>
<property name="password" value="....."/>
<property name="columnMapping" value="xxxx, xxxx"/>
<property name="columnTypes" value="String,String"/>
<property name="start" value="#{partitionPlan['partition.start']}"/>
<property name="end" value="#{partitionPlan['partition.end']}"/>
<!--CONCUR_READ_ONLY: If you set this as a value of the concurrency while creating the ResultSet object you cannot update the contents of the ResultSet you can only read/retrieve them.-->
<!--CONCUR_UPDATABLE: If you set this as a value of the concurrency while creating the ResultSet object you can update the contents of the ResultSet.-->
<!--TYPE_SCROLL_SENSITIVE: ResultSet is sensitive to the changes that are made in the database i.e. the modifications done in the database are reflected in the ResultSet.-->
<property name="resultSetProperties"
value="fetchSize=5500, resultSetConcurrency=CONCUR_READ_ONLY,
fetchDirection=FETCH_REVERSE,
resultSetType=TYPE_SCROLL_SENSITIVE,
resultSetHoldability=HOLD_CURSORS_OVER_COMMIT"/>
</properties>
</reader>
<processor ref="myProcessort"/>
<writer ref="myWriter"/>
</chunk>
<!-- run your sql with a count to define partitions evenly -->
<partition>
<plan partitions="16" threads="16">
<properties partition="0">
<property name="partition.start" value="0"/>
<property name="partition.end" value="500"/>
</properties>
<properties partition="1">
<property name="partition.start" value="500"/>
<property name="partition.end" value="1000"/>
</properties>
<!-- ... -->
<properties partition="15">
<property name="partition.start" value="5000"/>
<property name="partition.end" value="5500"/>
</properties>
您需要像在第二个 XML 片段中那样定义步骤分区。然后在 jdbcItemReader
中定义 start
和 end
属性,这 2 个属性分别引用分区属性 partition.start
和 partition.end
。
2 个分区属性可以不同命名,只要它们在 partition
和 item-reader
元素中保持一致即可。
例如,
<reader ref="jdbcItemReader">
<properties>
<property name="start" value="#{partitionPlan['partition.start']}"/>
<property name="end" value="#{partitionPlan['partition.end']}"/>
</properties>
</reader>