文件中年月数据的平均温度 python
average temperature from year and month data in a file python
我有一个数据文件,其中包含某种特定格式的数据,并且在处理时有一些额外的行要忽略。我需要处理数据并根据数据计算出一个值。
示例数据:
Average monthly temperatures in Dubuque, Iowa,
January 1964 through december 1975, n=144
24.7 25.7 30.6 47.5 62.9 68.5 73.7 67.9 61.1 48.5 39.6 20.0
16.1 19.1 24.2 45.4 61.3 66.5 72.1 68.4 60.2 50.9 37.4 31.1
10.4 21.6 37.4 44.7 53.2 68.0 73.7 68.2 60.7 50.2 37.2 24.6
21.5 14.7 35.0 48.3 54.0 68.2 69.6 65.7 60.8 49.1 33.2 26.0
19.1 20.6 40.2 50.0 55.3 67.7 70.7 70.3 60.6 50.7 35.8 20.7
14.0 24.1 29.4 46.6 58.6 62.2 72.1 71.7 61.9 47.6 34.2 20.4
8.4 19.0 31.4 48.7 61.6 68.1 72.2 70.6 62.5 52.7 36.7 23.8
11.2 20.0 29.6 47.7 55.8 73.2 68.0 67.1 64.9 57.1 37.6 27.7
13.4 17.2 30.8 43.7 62.3 66.4 70.2 71.6 62.1 46.0 32.7 17.3
22.5 25.7 42.3 45.2 55.5 68.9 72.3 72.3 62.5 55.6 38.0 20.4
17.6 20.5 34.2 49.2 54.8 63.8 74.0 67.1 57.7 50.8 36.8 25.5
20.4 19.6 24.6 41.3 61.8 68.5 72.0 71.1 57.3 52.5 40.6 26.2
样本文件来源:http://robjhyndman.com/tsdldata/data/cryer2.dat
注意:此处,行代表年份,列代表月份。
我正在尝试编写一个函数,其中 returns 从给定的 url 开始的任何一个月的平均温度。
我试过如下:
def avg_temp_march(f):
march_temps = []
# read each line of the file and store the values
# as floats in a list
for line in f:
line = str(line, 'ascii') # now line is a string
temps = line.split()
# check that it is not empty.
if temps != []:
march_temps.append(float(temps[2]))
# calculate the average and return it
return sum(march_temps) / len(march_temps)
avg_temp_march("data5.txt")
但是我得到了错误 line = str(line, 'ascii')
TypeError: decoding str is not supported
我认为不需要将字符串转换为字符串。
我尝试通过一些修改来修复您的代码:
def avg_temp_march(f):
# f is a string read from file
march_temps = []
for line in f.split("\n"):
if line == "": continue
temps = line.split(" ")
temps = [t for t in temps if t != ""]
# check that it is not empty.
month_index = 2
if len(temps) > month_index:
try:
march_temps.append(float(temps[month_index]))
except Exception, e:
print temps
print "Skipping line:", e
# calculate the average and return it
return sum(march_temps) / len(march_temps)
输出:
['Average', 'monthly', 'temperatures', 'in', 'Dubuque,', 'Iowa,']
Skipping line: could not convert string to float: temperatures
['January', '1964', 'through', 'december', '1975,', 'n=144']
Skipping line: could not convert string to float: through
32.475
根据你原来的问题(最新编辑之前),我认为你可以通过这种方式解决你的问题。
# from urllib2 import urlopen
from urllib.request import urlopen #python3
def avg_temp_march(url):
f = urlopen(url).read()
data = f.split("\n")[3:] #ingore the first 3 lines
data = [line.split() for line in data if line!=''] #ignore the empty lines
data = [map(float, line) for line in data] #Convert all numbers to float
month_index = 2 # 2 for march
monthly_sum = sum([line[month_index] for line in data])
monthly_avg = monthly_sum/len(data)
return monthly_avg
print avg_temp_march("http://robjhyndman.com/tsdldata/data/cryer2.dat")
使用pandas,代码变得更短:
import calendar
import pandas a spd
df = pd.read_csv('data5.txt', delim_whitespace=True, skiprows=2,
names=calendar.month_abbr[1:])
现在三月份:
>>> df.Mar.mean()
32.475000000000001
所有月份:
>>> df.mean()
Jan 16.608333
Feb 20.650000
Mar 32.475000
Apr 46.525000
May 58.091667
Jun 67.500000
Jul 71.716667
Aug 69.333333
Sep 61.025000
Oct 50.975000
Nov 36.650000
Dec 23.641667
dtype: float64
我有一个数据文件,其中包含某种特定格式的数据,并且在处理时有一些额外的行要忽略。我需要处理数据并根据数据计算出一个值。
示例数据:
Average monthly temperatures in Dubuque, Iowa,
January 1964 through december 1975, n=144
24.7 25.7 30.6 47.5 62.9 68.5 73.7 67.9 61.1 48.5 39.6 20.0
16.1 19.1 24.2 45.4 61.3 66.5 72.1 68.4 60.2 50.9 37.4 31.1
10.4 21.6 37.4 44.7 53.2 68.0 73.7 68.2 60.7 50.2 37.2 24.6
21.5 14.7 35.0 48.3 54.0 68.2 69.6 65.7 60.8 49.1 33.2 26.0
19.1 20.6 40.2 50.0 55.3 67.7 70.7 70.3 60.6 50.7 35.8 20.7
14.0 24.1 29.4 46.6 58.6 62.2 72.1 71.7 61.9 47.6 34.2 20.4
8.4 19.0 31.4 48.7 61.6 68.1 72.2 70.6 62.5 52.7 36.7 23.8
11.2 20.0 29.6 47.7 55.8 73.2 68.0 67.1 64.9 57.1 37.6 27.7
13.4 17.2 30.8 43.7 62.3 66.4 70.2 71.6 62.1 46.0 32.7 17.3
22.5 25.7 42.3 45.2 55.5 68.9 72.3 72.3 62.5 55.6 38.0 20.4
17.6 20.5 34.2 49.2 54.8 63.8 74.0 67.1 57.7 50.8 36.8 25.5
20.4 19.6 24.6 41.3 61.8 68.5 72.0 71.1 57.3 52.5 40.6 26.2
样本文件来源:http://robjhyndman.com/tsdldata/data/cryer2.dat
注意:此处,行代表年份,列代表月份。
我正在尝试编写一个函数,其中 returns 从给定的 url 开始的任何一个月的平均温度。
我试过如下:
def avg_temp_march(f):
march_temps = []
# read each line of the file and store the values
# as floats in a list
for line in f:
line = str(line, 'ascii') # now line is a string
temps = line.split()
# check that it is not empty.
if temps != []:
march_temps.append(float(temps[2]))
# calculate the average and return it
return sum(march_temps) / len(march_temps)
avg_temp_march("data5.txt")
但是我得到了错误 line = str(line, 'ascii')
TypeError: decoding str is not supported
我认为不需要将字符串转换为字符串。
我尝试通过一些修改来修复您的代码:
def avg_temp_march(f):
# f is a string read from file
march_temps = []
for line in f.split("\n"):
if line == "": continue
temps = line.split(" ")
temps = [t for t in temps if t != ""]
# check that it is not empty.
month_index = 2
if len(temps) > month_index:
try:
march_temps.append(float(temps[month_index]))
except Exception, e:
print temps
print "Skipping line:", e
# calculate the average and return it
return sum(march_temps) / len(march_temps)
输出:
['Average', 'monthly', 'temperatures', 'in', 'Dubuque,', 'Iowa,']
Skipping line: could not convert string to float: temperatures
['January', '1964', 'through', 'december', '1975,', 'n=144']
Skipping line: could not convert string to float: through
32.475
根据你原来的问题(最新编辑之前),我认为你可以通过这种方式解决你的问题。
# from urllib2 import urlopen
from urllib.request import urlopen #python3
def avg_temp_march(url):
f = urlopen(url).read()
data = f.split("\n")[3:] #ingore the first 3 lines
data = [line.split() for line in data if line!=''] #ignore the empty lines
data = [map(float, line) for line in data] #Convert all numbers to float
month_index = 2 # 2 for march
monthly_sum = sum([line[month_index] for line in data])
monthly_avg = monthly_sum/len(data)
return monthly_avg
print avg_temp_march("http://robjhyndman.com/tsdldata/data/cryer2.dat")
使用pandas,代码变得更短:
import calendar
import pandas a spd
df = pd.read_csv('data5.txt', delim_whitespace=True, skiprows=2,
names=calendar.month_abbr[1:])
现在三月份:
>>> df.Mar.mean()
32.475000000000001
所有月份:
>>> df.mean()
Jan 16.608333
Feb 20.650000
Mar 32.475000
Apr 46.525000
May 58.091667
Jun 67.500000
Jul 71.716667
Aug 69.333333
Sep 61.025000
Oct 50.975000
Nov 36.650000
Dec 23.641667
dtype: float64