如何使用 map reduce 识别冷热天?

How to identify Hot and Cold days using map reduce?

我的数据如下:

20130101  12.8   9.6
20130102  10.1   3.8
20130103   7.0  -2.2
20130104  11.8  -3.7
20130105   8.6  -1.1
20130106  10.5   1.9
20130107  13.4  -0.1
20130108  16.2   1.4
20130109  17.8  12.4
20130110  20.0  16.2
20130111  15.4  5.0

我想确定最高温度大于 40(炎热的一天)和最低温度低于 10(寒冷的一天)的日期。 为此,我 运行 以下代码:

current_date = None
current_temp = None
for line in data.strip(). split('\n'):
    Mapper_data = ["%s\o%s\o%s" % (line.split('  ')[0], line.split('  ')[1],line.split('  ')[2]) ]
    for line in Mapper_data:
        line = line.strip()
        date, max_temp,min_temp = line.rsplit('\o', 2)
        try:
            max_temp = float(max_temp)
            min_temp = float(min_temp)    
       except ValueError:
            continue
       if current_date == date:
           if max_temp > 40:
                current_temp = 'Hot day'
           if min_temp< 10:
                current_temp = 'Cold day'

      else:
            if current_date:
                print ('%s\t%s' % (current_date, current_temp))
            if max_temp > 40:
               current_temp = 'Hot day' 
            if min_temp< 10:
               current_temp = 'Cold day'
           current_date = date
if current_date == date:
    print ('%s\t%s' % (current_date, current_temp))

我得到以下结果:

20130101    Cold day
20130102    Cold day
20130103    Cold day
20130104    Cold day
20130105    Cold day
20130106    Cold day
20130107    Cold day
20130108    Cold day
20130109    Cold day
20130110    Cold day
20130111    Cold day

但我需要的结果是:

20130101    Cold day
20130102    Cold day
20130103    Cold day
20130104    Cold day
20130105    Cold day
20130106    Cold day
20130107    Cold day
20130108    Cold day
20130111    Cold day

因为20130109和20130110既不冷也不热

如果您知道如何更改我的代码以获得最后的结果,请帮忙。

如果你想要一个兼容 Hadoop 的 Python 脚本,它需要从 STDIN

读取
import sys

for line in sys.stdin:
    current_date, max_temp, min_temp = line.split()
    condition = None
    try:
        f_min_temp = float(min_temp)
        f_max_temp = float(max_temp)
    except ValueError:
        continue

    if f_max_temp > 40:
        condition = 'Hot day'
    if f_min_temp < 10:
        condition = 'Cold day'

    if condition:
         print ('%s\t%s' % (current_date, condition))

这里有一个 运行 本地的例子

$ python data.py < data.txt
20130101    Cold day
20130102    Cold day
20130103    Cold day
20130104    Cold day
20130105    Cold day
20130106    Cold day
20130107    Cold day
20130108    Cold day
20130111    Cold day

对于 Hadoop 中的 运行,请参阅 Hadoop Streaming