无法将猪中的字符数组转换为十进制
Unable to convert char array to decimal in the pig
大家好,我是 Pig 编程的新手。我有一个 csv 文件和制表符作为分隔符。在我的价格栏中,我有 lacs 和 crores。看起来像这样 40.85 Lacs
、36.73 Lacs
、2.01 cr
。我想像这样转换成小数
40.85 Lacs - 40,85,000
36.73 Lacs - 36,73,000
2.01 cr - 2,01,00000
我尝试了以下代码:
a = LOAD '/user/user1/input/city/cityname.CSV' using PigStorage('|') as (SourceWebSite:chararray,PropertyID:chararray,ListedOn:chararray,ContactName:chararray,TotalViews:int,Price:chararray,PriceperArea:chararray,NoOfBedRooms:int,NoOfBathRooms:int,FloorNoOfProperty:chararray,TotalFloors:int,Possession:chararray,BuiltUpArea:chararray,Furnished:chararray,Ownership:chararray,NewResale:chararray,Facing:chararray,title:chararray,PropertyAddress:chararray,NearByFacilities:chararray,PropertyFeatures:chararray,Sellerinfo:chararray,Description:chararray);
DUMP a;
b = FOREACH a GENERATE Price;
dump b;
c = FILTER b BY (Price matches '.*Lacs.*');
d = FOREACH c GENERATE Price * 10000.0,SUBSTRING(Price,00000);
d = foreach c generate Price,TOKENIZE(REPLACE(Price,'.','')) AS e;
我为此纠结了两天。任何帮助将不胜感激。
用REGEX_EXTRACT
取值的第一部分,直到空白,将其转换为BigDecimal,然后相乘。使用此输入数据,例如:
(40.85 Lacs,1)
(36.73 Lacs,2)
(2.01 cr,3)
以下代码可以工作:
A = load 'data' using PigStorage(';');
B = foreach A generate (bigdecimal)REGEX_EXTRACT([=11=], '(.*) (.*)', 1) * 1000000;
dump B;
输出将是:
(40850000.00)
(36730000.00)
(2010000.00)
大家好,我是 Pig 编程的新手。我有一个 csv 文件和制表符作为分隔符。在我的价格栏中,我有 lacs 和 crores。看起来像这样 40.85 Lacs
、36.73 Lacs
、2.01 cr
。我想像这样转换成小数
40.85 Lacs - 40,85,000
36.73 Lacs - 36,73,000
2.01 cr - 2,01,00000
我尝试了以下代码:
a = LOAD '/user/user1/input/city/cityname.CSV' using PigStorage('|') as (SourceWebSite:chararray,PropertyID:chararray,ListedOn:chararray,ContactName:chararray,TotalViews:int,Price:chararray,PriceperArea:chararray,NoOfBedRooms:int,NoOfBathRooms:int,FloorNoOfProperty:chararray,TotalFloors:int,Possession:chararray,BuiltUpArea:chararray,Furnished:chararray,Ownership:chararray,NewResale:chararray,Facing:chararray,title:chararray,PropertyAddress:chararray,NearByFacilities:chararray,PropertyFeatures:chararray,Sellerinfo:chararray,Description:chararray);
DUMP a;
b = FOREACH a GENERATE Price;
dump b;
c = FILTER b BY (Price matches '.*Lacs.*');
d = FOREACH c GENERATE Price * 10000.0,SUBSTRING(Price,00000);
d = foreach c generate Price,TOKENIZE(REPLACE(Price,'.','')) AS e;
我为此纠结了两天。任何帮助将不胜感激。
用REGEX_EXTRACT
取值的第一部分,直到空白,将其转换为BigDecimal,然后相乘。使用此输入数据,例如:
(40.85 Lacs,1)
(36.73 Lacs,2)
(2.01 cr,3)
以下代码可以工作:
A = load 'data' using PigStorage(';');
B = foreach A generate (bigdecimal)REGEX_EXTRACT([=11=], '(.*) (.*)', 1) * 1000000;
dump B;
输出将是:
(40850000.00)
(36730000.00)
(2010000.00)