在 pentaho 中将行转换为 headers
transforming rows to headers in pentaho
我有一个如下所示的文件
FIELD1,FIELD2
name,ABC
age,29
location,ZZ
name,XYZ
age,33
location,YY
我要求输出如下。我尝试了 row-denormalization,但它没有给出正确的输出
name,age,location
ABC,29,ZZ
XYZ,33,YY
行去规范化器可以产生这样的输出,但它需要输入中每个实体的一些标识符。并且输入必须按此标识符排序。
因此您首先需要将文件的输出转换为以下结构:
ID,FIELD1,FIELD2
0,name,ABC
0,age,29
0,location,ZZ
1,name,XYZ
1,age,33
1,location,YY
实现这一点的方法之一是组合 Add sequence
(从 0 开始)和 User Defined Java Expression
(将表达式设置为 ID / 3
,如果您总是正好有三行对应到同一个实体)。
然后就可以像下图那样使用Row denormalizer
了
您的转换将如下所示:
转换步骤xml(只需复制并粘贴到转换canvas):
<?xml version="1.0" encoding="UTF-8"?>
<transformation-steps>
<steps>
<step>
<name>Data Grid</name>
<type>DataGrid</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<fields>
<field>
<name>FIELD1</name>
<type>String</type>
<format/>
<currency/>
<decimal/>
<group/>
<length>-1</length>
<precision>-1</precision>
<set_empty_string>N</set_empty_string>
</field>
<field>
<name>FIELD2</name>
<type>String</type>
<format/>
<currency/>
<decimal/>
<group/>
<length>-1</length>
<precision>-1</precision>
<set_empty_string>N</set_empty_string>
</field>
</fields>
<data>
<line> <item>name</item><item>ABC</item> </line>
<line> <item>age</item><item>29</item> </line>
<line> <item>location</item><item>ZZ</item> </line>
<line> <item>name</item><item>XYZ</item> </line>
<line> <item>age</item><item>33</item> </line>
<line> <item>location</item><item>YY</item> </line>
</data>
<cluster_schema/>
<remotesteps> <input> </input> <output> </output> </remotesteps> <GUI>
<xloc>128</xloc>
<yloc>64</yloc>
<draw>Y</draw>
</GUI>
</step>
<step>
<name>Row denormaliser</name>
<type>Denormaliser</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<key_field>FIELD1</key_field>
<group>
<field>
<name>ID</name>
</field>
</group>
<fields>
<field>
<field_name>FIELD2</field_name>
<key_value>name</key_value>
<target_name>name</target_name>
<target_type>String</target_type>
<target_format/>
<target_length>-1</target_length>
<target_precision>-1</target_precision>
<target_decimal_symbol/>
<target_grouping_symbol/>
<target_currency_symbol/>
<target_null_string/>
<target_aggregation_type>-</target_aggregation_type>
</field>
<field>
<field_name>FIELD2</field_name>
<key_value>age</key_value>
<target_name>age</target_name>
<target_type>Integer</target_type>
<target_format/>
<target_length>-1</target_length>
<target_precision>-1</target_precision>
<target_decimal_symbol/>
<target_grouping_symbol/>
<target_currency_symbol/>
<target_null_string/>
<target_aggregation_type>-</target_aggregation_type>
</field>
<field>
<field_name>FIELD2</field_name>
<key_value>location</key_value>
<target_name>location</target_name>
<target_type>String</target_type>
<target_format/>
<target_length>-1</target_length>
<target_precision>-1</target_precision>
<target_decimal_symbol/>
<target_grouping_symbol/>
<target_currency_symbol/>
<target_null_string/>
<target_aggregation_type>-</target_aggregation_type>
</field>
</fields>
<cluster_schema/>
<remotesteps> <input> </input> <output> </output> </remotesteps> <GUI>
<xloc>672</xloc>
<yloc>64</yloc>
<draw>Y</draw>
</GUI>
</step>
<step>
<name>Add sequence</name>
<type>Sequence</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<valuename>ID</valuename>
<use_database>N</use_database>
<connection/>
<schema/>
<seqname>SEQ_</seqname>
<use_counter>Y</use_counter>
<counter_name/>
<start_at>0</start_at>
<increment_by>1</increment_by>
<max_value>999999999</max_value>
<cluster_schema/>
<remotesteps> <input> </input> <output> </output> </remotesteps> <GUI>
<xloc>272</xloc>
<yloc>64</yloc>
<draw>Y</draw>
</GUI>
</step>
<step>
<name>User Defined Java Expression</name>
<type>Janino</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<formula><field_name>ID</field_name>
<formula_string>ID / 3</formula_string>
<value_type>Integer</value_type>
<value_length>-1</value_length>
<value_precision>-1</value_precision>
<replace_field>ID</replace_field>
</formula>
<cluster_schema/>
<remotesteps> <input> </input> <output> </output> </remotesteps> <GUI>
<xloc>432</xloc>
<yloc>64</yloc>
<draw>Y</draw>
</GUI>
</step>
</steps>
<order>
<hop> <from>Data Grid</from><to>Add sequence</to><enabled>Y</enabled> </hop>
<hop> <from>Add sequence</from><to>User Defined Java Expression</to><enabled>Y</enabled> </hop>
<hop> <from>User Defined Java Expression</from><to>Row denormaliser</to><enabled>Y</enabled> </hop>
</order>
<notepads>
</notepads>
<step_error_handling>
</step_error_handling>
</transformation-steps>
最后,如果需要,您可以使用 Select values
步骤去掉 ID 列。
我有一个如下所示的文件
FIELD1,FIELD2
name,ABC
age,29
location,ZZ
name,XYZ
age,33
location,YY
我要求输出如下。我尝试了 row-denormalization,但它没有给出正确的输出
name,age,location
ABC,29,ZZ
XYZ,33,YY
行去规范化器可以产生这样的输出,但它需要输入中每个实体的一些标识符。并且输入必须按此标识符排序。
因此您首先需要将文件的输出转换为以下结构:
ID,FIELD1,FIELD2
0,name,ABC
0,age,29
0,location,ZZ
1,name,XYZ
1,age,33
1,location,YY
实现这一点的方法之一是组合 Add sequence
(从 0 开始)和 User Defined Java Expression
(将表达式设置为 ID / 3
,如果您总是正好有三行对应到同一个实体)。
然后就可以像下图那样使用Row denormalizer
了
您的转换将如下所示:
转换步骤xml(只需复制并粘贴到转换canvas):
<?xml version="1.0" encoding="UTF-8"?>
<transformation-steps>
<steps>
<step>
<name>Data Grid</name>
<type>DataGrid</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<fields>
<field>
<name>FIELD1</name>
<type>String</type>
<format/>
<currency/>
<decimal/>
<group/>
<length>-1</length>
<precision>-1</precision>
<set_empty_string>N</set_empty_string>
</field>
<field>
<name>FIELD2</name>
<type>String</type>
<format/>
<currency/>
<decimal/>
<group/>
<length>-1</length>
<precision>-1</precision>
<set_empty_string>N</set_empty_string>
</field>
</fields>
<data>
<line> <item>name</item><item>ABC</item> </line>
<line> <item>age</item><item>29</item> </line>
<line> <item>location</item><item>ZZ</item> </line>
<line> <item>name</item><item>XYZ</item> </line>
<line> <item>age</item><item>33</item> </line>
<line> <item>location</item><item>YY</item> </line>
</data>
<cluster_schema/>
<remotesteps> <input> </input> <output> </output> </remotesteps> <GUI>
<xloc>128</xloc>
<yloc>64</yloc>
<draw>Y</draw>
</GUI>
</step>
<step>
<name>Row denormaliser</name>
<type>Denormaliser</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<key_field>FIELD1</key_field>
<group>
<field>
<name>ID</name>
</field>
</group>
<fields>
<field>
<field_name>FIELD2</field_name>
<key_value>name</key_value>
<target_name>name</target_name>
<target_type>String</target_type>
<target_format/>
<target_length>-1</target_length>
<target_precision>-1</target_precision>
<target_decimal_symbol/>
<target_grouping_symbol/>
<target_currency_symbol/>
<target_null_string/>
<target_aggregation_type>-</target_aggregation_type>
</field>
<field>
<field_name>FIELD2</field_name>
<key_value>age</key_value>
<target_name>age</target_name>
<target_type>Integer</target_type>
<target_format/>
<target_length>-1</target_length>
<target_precision>-1</target_precision>
<target_decimal_symbol/>
<target_grouping_symbol/>
<target_currency_symbol/>
<target_null_string/>
<target_aggregation_type>-</target_aggregation_type>
</field>
<field>
<field_name>FIELD2</field_name>
<key_value>location</key_value>
<target_name>location</target_name>
<target_type>String</target_type>
<target_format/>
<target_length>-1</target_length>
<target_precision>-1</target_precision>
<target_decimal_symbol/>
<target_grouping_symbol/>
<target_currency_symbol/>
<target_null_string/>
<target_aggregation_type>-</target_aggregation_type>
</field>
</fields>
<cluster_schema/>
<remotesteps> <input> </input> <output> </output> </remotesteps> <GUI>
<xloc>672</xloc>
<yloc>64</yloc>
<draw>Y</draw>
</GUI>
</step>
<step>
<name>Add sequence</name>
<type>Sequence</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<valuename>ID</valuename>
<use_database>N</use_database>
<connection/>
<schema/>
<seqname>SEQ_</seqname>
<use_counter>Y</use_counter>
<counter_name/>
<start_at>0</start_at>
<increment_by>1</increment_by>
<max_value>999999999</max_value>
<cluster_schema/>
<remotesteps> <input> </input> <output> </output> </remotesteps> <GUI>
<xloc>272</xloc>
<yloc>64</yloc>
<draw>Y</draw>
</GUI>
</step>
<step>
<name>User Defined Java Expression</name>
<type>Janino</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<formula><field_name>ID</field_name>
<formula_string>ID / 3</formula_string>
<value_type>Integer</value_type>
<value_length>-1</value_length>
<value_precision>-1</value_precision>
<replace_field>ID</replace_field>
</formula>
<cluster_schema/>
<remotesteps> <input> </input> <output> </output> </remotesteps> <GUI>
<xloc>432</xloc>
<yloc>64</yloc>
<draw>Y</draw>
</GUI>
</step>
</steps>
<order>
<hop> <from>Data Grid</from><to>Add sequence</to><enabled>Y</enabled> </hop>
<hop> <from>Add sequence</from><to>User Defined Java Expression</to><enabled>Y</enabled> </hop>
<hop> <from>User Defined Java Expression</from><to>Row denormaliser</to><enabled>Y</enabled> </hop>
</order>
<notepads>
</notepads>
<step_error_handling>
</step_error_handling>
</transformation-steps>
最后,如果需要,您可以使用 Select values
步骤去掉 ID 列。