正则表达式在 Apache Pig 中提取字符串的第一部分

Regex to extract first part of string in Apache Pig

我需要从下面的输入数据中提取post代码区

AB55 4
DD7 6LL
DD5 2HI

我的代码

A = load 'data' as postcode:chararray;
B = foreach A {
code_district = REGEX_EXTRACT(postcode,'<SOME EXP>',1);
generate code_district;
};
dump B;

输出应该类似于

AB55
DD7
DD5

提取字符串第一部分的正则表达式应该是什么?

你能试试下面的正则表达式吗?

选项 1:

A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'(\w+).*',1);
DUMP code_district;

选项2:

A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'([a-zA-Z0-9]+).*',1);
DUMP code_district;

输出:

(AB55)
(DD7)
(DD5)