在 pentaho 中将行转换为 headers

transforming rows to headers in pentaho

我有一个如下所示的文件

FIELD1,FIELD2
name,ABC
age,29
location,ZZ
name,XYZ
age,33
location,YY

我要求输出如下。我尝试了 row-denormalization,但它没有给出正确的输出

name,age,location
ABC,29,ZZ
XYZ,33,YY

行去规范化器可以产生这样的输出,但它需要输入中每个实体的一些标识符。并且输入必须按此标识符排序。

因此您首先需要将文件的输出转换为以下结构:

ID,FIELD1,FIELD2
0,name,ABC
0,age,29
0,location,ZZ
1,name,XYZ
1,age,33
1,location,YY

实现这一点的方法之一是组合 Add sequence(从 0 开始)和 User Defined Java Expression(将表达式设置为 ID / 3,如果您总是正好有三行对应到同一个实体)。

然后就可以像下图那样使用Row denormalizer

您的转换将如下所示:

转换步骤xml(只需复制并粘贴到转换canvas):

<?xml version="1.0" encoding="UTF-8"?>
<transformation-steps>
<steps>
  <step>
    <name>Data Grid</name>
    <type>DataGrid</type>
    <description/>
    <distribute>Y</distribute>
    <custom_distribution/>
    <copies>1</copies>
         <partitioning>
           <method>none</method>
           <schema_name/>
           </partitioning>
    <fields>
      <field>
        <name>FIELD1</name>
        <type>String</type>
        <format/>
        <currency/>
        <decimal/>
        <group/>
        <length>-1</length>
        <precision>-1</precision>
        <set_empty_string>N</set_empty_string>
      </field>
      <field>
        <name>FIELD2</name>
        <type>String</type>
        <format/>
        <currency/>
        <decimal/>
        <group/>
        <length>-1</length>
        <precision>-1</precision>
        <set_empty_string>N</set_empty_string>
      </field>
    </fields>
    <data>
      <line> <item>name</item><item>ABC</item> </line>
      <line> <item>age</item><item>29</item> </line>
      <line> <item>location</item><item>ZZ</item> </line>
      <line> <item>name</item><item>XYZ</item> </line>
      <line> <item>age</item><item>33</item> </line>
      <line> <item>location</item><item>YY</item> </line>
    </data>
     <cluster_schema/>
 <remotesteps>   <input>   </input>   <output>   </output> </remotesteps>    <GUI>
      <xloc>128</xloc>
      <yloc>64</yloc>
      <draw>Y</draw>
      </GUI>
    </step>

  <step>
    <name>Row denormaliser</name>
    <type>Denormaliser</type>
    <description/>
    <distribute>Y</distribute>
    <custom_distribution/>
    <copies>1</copies>
         <partitioning>
           <method>none</method>
           <schema_name/>
           </partitioning>
      <key_field>FIELD1</key_field>
      <group>
        <field>
          <name>ID</name>
          </field>
        </group>
      <fields>
        <field>
          <field_name>FIELD2</field_name>
          <key_value>name</key_value>
          <target_name>name</target_name>
          <target_type>String</target_type>
          <target_format/>
          <target_length>-1</target_length>
          <target_precision>-1</target_precision>
          <target_decimal_symbol/>
          <target_grouping_symbol/>
          <target_currency_symbol/>
          <target_null_string/>
          <target_aggregation_type>-</target_aggregation_type>
          </field>
        <field>
          <field_name>FIELD2</field_name>
          <key_value>age</key_value>
          <target_name>age</target_name>
          <target_type>Integer</target_type>
          <target_format/>
          <target_length>-1</target_length>
          <target_precision>-1</target_precision>
          <target_decimal_symbol/>
          <target_grouping_symbol/>
          <target_currency_symbol/>
          <target_null_string/>
          <target_aggregation_type>-</target_aggregation_type>
          </field>
        <field>
          <field_name>FIELD2</field_name>
          <key_value>location</key_value>
          <target_name>location</target_name>
          <target_type>String</target_type>
          <target_format/>
          <target_length>-1</target_length>
          <target_precision>-1</target_precision>
          <target_decimal_symbol/>
          <target_grouping_symbol/>
          <target_currency_symbol/>
          <target_null_string/>
          <target_aggregation_type>-</target_aggregation_type>
          </field>
        </fields>
     <cluster_schema/>
 <remotesteps>   <input>   </input>   <output>   </output> </remotesteps>    <GUI>
      <xloc>672</xloc>
      <yloc>64</yloc>
      <draw>Y</draw>
      </GUI>
    </step>

  <step>
    <name>Add sequence</name>
    <type>Sequence</type>
    <description/>
    <distribute>Y</distribute>
    <custom_distribution/>
    <copies>1</copies>
         <partitioning>
           <method>none</method>
           <schema_name/>
           </partitioning>
      <valuename>ID</valuename>
      <use_database>N</use_database>
      <connection/>
      <schema/>
      <seqname>SEQ_</seqname>
      <use_counter>Y</use_counter>
      <counter_name/>
      <start_at>0</start_at>
      <increment_by>1</increment_by>
      <max_value>999999999</max_value>
     <cluster_schema/>
 <remotesteps>   <input>   </input>   <output>   </output> </remotesteps>    <GUI>
      <xloc>272</xloc>
      <yloc>64</yloc>
      <draw>Y</draw>
      </GUI>
    </step>

  <step>
    <name>User Defined Java Expression</name>
    <type>Janino</type>
    <description/>
    <distribute>Y</distribute>
    <custom_distribution/>
    <copies>1</copies>
         <partitioning>
           <method>none</method>
           <schema_name/>
           </partitioning>
       <formula><field_name>ID</field_name>
<formula_string>ID &#x2f; 3</formula_string>
<value_type>Integer</value_type>
<value_length>-1</value_length>
<value_precision>-1</value_precision>
<replace_field>ID</replace_field>
</formula>
     <cluster_schema/>
 <remotesteps>   <input>   </input>   <output>   </output> </remotesteps>    <GUI>
      <xloc>432</xloc>
      <yloc>64</yloc>
      <draw>Y</draw>
      </GUI>
    </step>

</steps>
<order>
  <hop> <from>Data Grid</from><to>Add sequence</to><enabled>Y</enabled> </hop>
  <hop> <from>Add sequence</from><to>User Defined Java Expression</to><enabled>Y</enabled> </hop>
  <hop> <from>User Defined Java Expression</from><to>Row denormaliser</to><enabled>Y</enabled> </hop>
</order>
<notepads>
</notepads>
<step_error_handling>
</step_error_handling>
</transformation-steps>

最后,如果需要,您可以使用 Select values 步骤去掉 ID 列。