在 pig 中使用 regex_extract 方法时打印空白 space
prints blank space while using regex_extract method in pig
我想拆分区域的字符串 conversion.I 有这样的数据。
(149Sq.Yards)
(151Sq.Yards)
(190Sq.Yards)
(190Sq.Yards)
我想像这样拆分上面的数据。
149 sq.yards
151 sq.yards
我尝试了以下代码。
a = LOAD '/user/ahmedabad/Makkan_PropertyDetails_Apartment_Ahmedabad.csv' using PigStorage('\t') as (SourceWebSite:chararray,PropertyID:chararray,ListedOn:chararray,ContactName:chararray,TotalViews:int,Price:chararray,PriceperArea:chararray,NoOfBedRooms:int,NoOfBathRooms:int,FloorNoOfProperty:chararray,TotalFloors:int,Possession:chararray,BuiltUpArea:chararray,Furnished:chararray,Ownership:chararray,NewResale:chararray,Facing:chararray,title:chararray,PropertyAddress:chararray,NearByFacilities:chararray,PropertyFeatures:chararray,Sellerinfo:chararray,Description:chararray);
b = FOREACH a GENERATE BuiltUpArea;
c = FILTER b BY (BuiltUpArea matches '.*Sq.Yards.*');
d = FOREACH c GENERATE (bigdecimal) REGEX_EXTRACT(BuiltUpArea,'(.*)', 1) * 9;
dump d .it 打印为 null。
您提到的正则表达式将匹配所有字符,因此它会尝试像这样相乘(149Sq.Yards * 9)
。这就是输出中出现 null 的原因。
下面的正则表达式将从输入中单独拆分数字并像这样相乘 (149 * 9)
。
d = FOREACH c GENERATE (bigdecimal) REGEX_EXTRACT(BuiltUpArea,'(^[0-9]+)', 1) * 9;
dump d;
我想拆分区域的字符串 conversion.I 有这样的数据。
(149Sq.Yards)
(151Sq.Yards)
(190Sq.Yards)
(190Sq.Yards)
我想像这样拆分上面的数据。
149 sq.yards
151 sq.yards
我尝试了以下代码。
a = LOAD '/user/ahmedabad/Makkan_PropertyDetails_Apartment_Ahmedabad.csv' using PigStorage('\t') as (SourceWebSite:chararray,PropertyID:chararray,ListedOn:chararray,ContactName:chararray,TotalViews:int,Price:chararray,PriceperArea:chararray,NoOfBedRooms:int,NoOfBathRooms:int,FloorNoOfProperty:chararray,TotalFloors:int,Possession:chararray,BuiltUpArea:chararray,Furnished:chararray,Ownership:chararray,NewResale:chararray,Facing:chararray,title:chararray,PropertyAddress:chararray,NearByFacilities:chararray,PropertyFeatures:chararray,Sellerinfo:chararray,Description:chararray);
b = FOREACH a GENERATE BuiltUpArea;
c = FILTER b BY (BuiltUpArea matches '.*Sq.Yards.*');
d = FOREACH c GENERATE (bigdecimal) REGEX_EXTRACT(BuiltUpArea,'(.*)', 1) * 9;
dump d .it 打印为 null。
您提到的正则表达式将匹配所有字符,因此它会尝试像这样相乘(149Sq.Yards * 9)
。这就是输出中出现 null 的原因。
下面的正则表达式将从输入中单独拆分数字并像这样相乘 (149 * 9)
。
d = FOREACH c GENERATE (bigdecimal) REGEX_EXTRACT(BuiltUpArea,'(^[0-9]+)', 1) * 9;
dump d;