如何使用 map reduce 识别冷热天?
How to identify Hot and Cold days using map reduce?
我的数据如下:
20130101 12.8 9.6
20130102 10.1 3.8
20130103 7.0 -2.2
20130104 11.8 -3.7
20130105 8.6 -1.1
20130106 10.5 1.9
20130107 13.4 -0.1
20130108 16.2 1.4
20130109 17.8 12.4
20130110 20.0 16.2
20130111 15.4 5.0
我想确定最高温度大于 40(炎热的一天)和最低温度低于 10(寒冷的一天)的日期。
为此,我 运行 以下代码:
current_date = None
current_temp = None
for line in data.strip(). split('\n'):
Mapper_data = ["%s\o%s\o%s" % (line.split(' ')[0], line.split(' ')[1],line.split(' ')[2]) ]
for line in Mapper_data:
line = line.strip()
date, max_temp,min_temp = line.rsplit('\o', 2)
try:
max_temp = float(max_temp)
min_temp = float(min_temp)
except ValueError:
continue
if current_date == date:
if max_temp > 40:
current_temp = 'Hot day'
if min_temp< 10:
current_temp = 'Cold day'
else:
if current_date:
print ('%s\t%s' % (current_date, current_temp))
if max_temp > 40:
current_temp = 'Hot day'
if min_temp< 10:
current_temp = 'Cold day'
current_date = date
if current_date == date:
print ('%s\t%s' % (current_date, current_temp))
我得到以下结果:
20130101 Cold day
20130102 Cold day
20130103 Cold day
20130104 Cold day
20130105 Cold day
20130106 Cold day
20130107 Cold day
20130108 Cold day
20130109 Cold day
20130110 Cold day
20130111 Cold day
但我需要的结果是:
20130101 Cold day
20130102 Cold day
20130103 Cold day
20130104 Cold day
20130105 Cold day
20130106 Cold day
20130107 Cold day
20130108 Cold day
20130111 Cold day
因为20130109和20130110既不冷也不热
如果您知道如何更改我的代码以获得最后的结果,请帮忙。
如果你想要一个兼容 Hadoop 的 Python 脚本,它需要从 STDIN
读取
import sys
for line in sys.stdin:
current_date, max_temp, min_temp = line.split()
condition = None
try:
f_min_temp = float(min_temp)
f_max_temp = float(max_temp)
except ValueError:
continue
if f_max_temp > 40:
condition = 'Hot day'
if f_min_temp < 10:
condition = 'Cold day'
if condition:
print ('%s\t%s' % (current_date, condition))
这里有一个 运行 本地的例子
$ python data.py < data.txt
20130101 Cold day
20130102 Cold day
20130103 Cold day
20130104 Cold day
20130105 Cold day
20130106 Cold day
20130107 Cold day
20130108 Cold day
20130111 Cold day
对于 Hadoop 中的 运行,请参阅 Hadoop Streaming
我的数据如下:
20130101 12.8 9.6
20130102 10.1 3.8
20130103 7.0 -2.2
20130104 11.8 -3.7
20130105 8.6 -1.1
20130106 10.5 1.9
20130107 13.4 -0.1
20130108 16.2 1.4
20130109 17.8 12.4
20130110 20.0 16.2
20130111 15.4 5.0
我想确定最高温度大于 40(炎热的一天)和最低温度低于 10(寒冷的一天)的日期。 为此,我 运行 以下代码:
current_date = None
current_temp = None
for line in data.strip(). split('\n'):
Mapper_data = ["%s\o%s\o%s" % (line.split(' ')[0], line.split(' ')[1],line.split(' ')[2]) ]
for line in Mapper_data:
line = line.strip()
date, max_temp,min_temp = line.rsplit('\o', 2)
try:
max_temp = float(max_temp)
min_temp = float(min_temp)
except ValueError:
continue
if current_date == date:
if max_temp > 40:
current_temp = 'Hot day'
if min_temp< 10:
current_temp = 'Cold day'
else:
if current_date:
print ('%s\t%s' % (current_date, current_temp))
if max_temp > 40:
current_temp = 'Hot day'
if min_temp< 10:
current_temp = 'Cold day'
current_date = date
if current_date == date:
print ('%s\t%s' % (current_date, current_temp))
我得到以下结果:
20130101 Cold day
20130102 Cold day
20130103 Cold day
20130104 Cold day
20130105 Cold day
20130106 Cold day
20130107 Cold day
20130108 Cold day
20130109 Cold day
20130110 Cold day
20130111 Cold day
但我需要的结果是:
20130101 Cold day
20130102 Cold day
20130103 Cold day
20130104 Cold day
20130105 Cold day
20130106 Cold day
20130107 Cold day
20130108 Cold day
20130111 Cold day
因为20130109和20130110既不冷也不热
如果您知道如何更改我的代码以获得最后的结果,请帮忙。
如果你想要一个兼容 Hadoop 的 Python 脚本,它需要从 STDIN
读取import sys
for line in sys.stdin:
current_date, max_temp, min_temp = line.split()
condition = None
try:
f_min_temp = float(min_temp)
f_max_temp = float(max_temp)
except ValueError:
continue
if f_max_temp > 40:
condition = 'Hot day'
if f_min_temp < 10:
condition = 'Cold day'
if condition:
print ('%s\t%s' % (current_date, condition))
这里有一个 运行 本地的例子
$ python data.py < data.txt
20130101 Cold day
20130102 Cold day
20130103 Cold day
20130104 Cold day
20130105 Cold day
20130106 Cold day
20130107 Cold day
20130108 Cold day
20130111 Cold day
对于 Hadoop 中的 运行,请参阅 Hadoop Streaming