正则表达式在 Apache Pig 中提取字符串的第一部分
Regex to extract first part of string in Apache Pig
我需要从下面的输入数据中提取post代码区
AB55 4
DD7 6LL
DD5 2HI
我的代码
A = load 'data' as postcode:chararray;
B = foreach A {
code_district = REGEX_EXTRACT(postcode,'<SOME EXP>',1);
generate code_district;
};
dump B;
输出应该类似于
AB55
DD7
DD5
提取字符串第一部分的正则表达式应该是什么?
你能试试下面的正则表达式吗?
选项 1:
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'(\w+).*',1);
DUMP code_district;
选项2:
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'([a-zA-Z0-9]+).*',1);
DUMP code_district;
输出:
(AB55)
(DD7)
(DD5)
我需要从下面的输入数据中提取post代码区
AB55 4
DD7 6LL
DD5 2HI
我的代码
A = load 'data' as postcode:chararray;
B = foreach A {
code_district = REGEX_EXTRACT(postcode,'<SOME EXP>',1);
generate code_district;
};
dump B;
输出应该类似于
AB55
DD7
DD5
提取字符串第一部分的正则表达式应该是什么?
你能试试下面的正则表达式吗?
选项 1:
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'(\w+).*',1);
DUMP code_district;
选项2:
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'([a-zA-Z0-9]+).*',1);
DUMP code_district;
输出:
(AB55)
(DD7)
(DD5)