将两列相加,在 MapReduce 中计算最大值、最小值和平均值
sum two columns, calculate max, min and mean value in MapReduce
我有一个mapper的示例代码如下,key是UCO,value是TaxiTotal,应该是TaxiIn和TaxiOut两列之和,请问两列如何求和?
我目前的方案TaxiIn + TaxiOut结果是粘贴数字,比如333+444 = 333444,我要777,代码怎么写?
#! /usr/bin/env python
import sys
# -- Airline Data
# Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum,
# TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, Origin, Dest, Distance, TaxiIn,
# TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay
for line in sys.stdin:
line = line.strip()
unpacked = line.split(",")
Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum, TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, Origin, Dest, Distance, TaxiIn,TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay = line.split(",")
UCO = "-".join([UniqueCarrier, Origin])
results = [UCO, TaxiIn+TaxiOut]
print("\t".join(results))
将TaxiIn + TaxiOut
转换为:
int(TaxiIn) + int(TaxiOut)
见下例:
In [1612]: TaxiIn = '333'
In [1613]: TaxiOut = '444'
In [1614]: TaxiIn + TaxiOut
Out[1614]: '333444'
In [1615]: int(TaxiIn) + int(TaxiOut)
Out[1615]: 777
你不能有字符串的数字总和,为此将 str
转换为 int
或 float
.
您的代码应该是:
results = [UCO, str(int(TaxiIn) + int(TaxiOut))]
print("\t".join(results))
我有一个mapper的示例代码如下,key是UCO,value是TaxiTotal,应该是TaxiIn和TaxiOut两列之和,请问两列如何求和?
我目前的方案TaxiIn + TaxiOut结果是粘贴数字,比如333+444 = 333444,我要777,代码怎么写?
#! /usr/bin/env python
import sys
# -- Airline Data
# Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum,
# TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, Origin, Dest, Distance, TaxiIn,
# TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay
for line in sys.stdin:
line = line.strip()
unpacked = line.split(",")
Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum, TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, Origin, Dest, Distance, TaxiIn,TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay = line.split(",")
UCO = "-".join([UniqueCarrier, Origin])
results = [UCO, TaxiIn+TaxiOut]
print("\t".join(results))
将TaxiIn + TaxiOut
转换为:
int(TaxiIn) + int(TaxiOut)
见下例:
In [1612]: TaxiIn = '333'
In [1613]: TaxiOut = '444'
In [1614]: TaxiIn + TaxiOut
Out[1614]: '333444'
In [1615]: int(TaxiIn) + int(TaxiOut)
Out[1615]: 777
你不能有字符串的数字总和,为此将 str
转换为 int
或 float
.
您的代码应该是:
results = [UCO, str(int(TaxiIn) + int(TaxiOut))]
print("\t".join(results))