使用 talend BigData 将半结构化数据转换为结构化数据

Question

Employee
 Employee Type                          : 0130
 Unit                                   : 4189670095711234
 Basic Salary                           : 11.00
 Joined Date                            : 04/12/yy 06:30:05
 Country                                : 826-United Kingdom

(123.66)                      --- Endof Employee -------------

R 4567 ABCD             -> Len f---- i 01/14

Employee
 Employee Type                          : 0120
 Unit                                   : 4189670095711234
 Basic Salary                           : 11.00
 Joined Date                            : 04/12/yy 06:30:05
 Country                                : 826-United Kingdom

(123.66)-                      --- Endof Employee ------------

R 4567 ABCD             -> Len f---- i 01/14

Employee
 Employee Type                          : 0130
 Unit                                   : 4189670095711235
 Basic Salary                           : 11.00
 Joined Date                            : 04/12/yy 06:30:05
 Country                                : 826-United Kingdom

(123.66)                      --- Endof Employee -------------

嗨，

我想使用 talend 将以下半结构化数据转换为结构化数据。

请告诉我如何将数据转换为结构化形式，以便我可以将其插入关系 table。

Answer 1

这是一个解决方案，感谢 tPivotToColumnsDelimited 组件。

tFileInputDelimilted 与 2 个字段模式（名称为属性和值）相关联，并且有一个特殊的字段分隔符，即“:”（space-冒号-space）。
高级设置选项 "Trim all columns" 和 "Check each row structure against schema" 已勾选。

tMap 在这里根据 "property" 名称为每个输入行关联一个等级：如您所见，序列名称基于属性名称，因此同一员工的每条文件记录将具有相同的等级值。

最后，tPivotToColumnsDelimited 在一行中移动所有具有相同排名值的输入记录，最重要的是，值与 rigth 属性相关联。设置 "Pivot column" 为 "property"，"Aggregation column" 为 "value"，"Aggregation function" 为 "first"，"Group by" 为 "rank"。 Select 输出所需的文件名，最后您将获得所需的结果：

希望对您有所帮助。

使用 talend BigData 将半结构化数据转换为结构化数据

Convert semi-structured data to structured data using talend BigData

etl

talend