如何拆分文本文件
How to split the textfile
04-05-1993:1.068
04-12-1993:1.079
04-19-1993:1.079
06-06-1994:1.065
06-13-1994:1.073
06-20-1994:1.079
我有天然气日期-年份-价格的文本文件,我想计算一年的平均天然气价格。所以我尝试拆分,
with open('c:/Gasprices.txt','r') as f:
fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)
但我无法获取年份和价格数据,而是这样的数据。
('04', '1.068'), ('04', '1.079')
请让我知道我应该知道什么。
另外,如果可以的话,请告诉我如何使用拆分数据通过字典计算每年的平均价格。
试试这个
with open('c:/Gasprices.txt','r') as f:
fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[0],x.split('-')[-1].split(':')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)
输出
[('04', '1993', '1.068'), ('04', '1993', '1.079'), ('04', '1993', '1.079'), ('06', '1994', '1.065'), ('06', '1994', '1.073'), ('06', '1994', '1.079')]
或
with open('c:/Gasprices.txt','r') as f:
fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[-1].split(':')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)
输出
[('1993', '1.068'), ('1993', '1.079'), ('1993', '1.079'), ('1994', '1.065'), ('1994', '1.073'), ('1994', '1.079')]
如前所述,要获得年份,您应该使用更复杂的拆分。但是您的格式似乎非常一致,您可能会选择:
datesprices=[(x[6:10], x[11:]) for x in fullfile]
但是如何求平均值呢?您需要在某处存储特定年份的列表。
from statistics import mean
my_dict = {} # could be defaultdict too
for year, price in datesprices:
if year not in my_dict:
my_dict[year] = []
my_dict[year].append(price)
for year, prices in my_dict.items():
print(year, mean(prices))
txt = ['04-05-1993:1.068', '04-12-1993:1.079', '04-19-1993:1.079', '06-06-1994:1.065', '06-13-1994:1.073', '06-20-1994:1.079']
price_per_year = {}
number_of_years = {}
for i in txt:
x = txt.split(':')
Date = x[0]
Price = x[1]
year = date.split('-')[2]
if year ~in price_per_year.keys:
price_per_year.update({year:Price})
number_of_years.update({year:1})
else:
price_per_year[year] += Price
number_of_years[year] += 1
av_price_1993 = price_per_year[1993] / number_of_years[1993]
av_price_1994
= price_per_year[1994] / number_of_years[1994]
我认为没有必要拆分输入行,因为它们具有固定的日期格式 - 即,它的长度是已知的。因此我们可以切片。
with open('gas.txt') as gas:
td = dict()
for line in gas:
year = line[6:10]
price = float(line[11:])
td.setdefault(year, []).append(price)
for k, v in td.items():
print(f'{k} {sum(v)/len(v):.3f}')
输出:
1993 1.075
1994 1.072
注:
这里没有检查空行。假设有 none 并且问题中显示的示例格式错误。
此外,无需删除传入行,因为 float() 不受 leading/trailing 空格
的影响
04-05-1993:1.068
04-12-1993:1.079
04-19-1993:1.079
06-06-1994:1.065
06-13-1994:1.073
06-20-1994:1.079
我有天然气日期-年份-价格的文本文件,我想计算一年的平均天然气价格。所以我尝试拆分,
with open('c:/Gasprices.txt','r') as f:
fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)
但我无法获取年份和价格数据,而是这样的数据。
('04', '1.068'), ('04', '1.079')
请让我知道我应该知道什么。
另外,如果可以的话,请告诉我如何使用拆分数据通过字典计算每年的平均价格。
试试这个
with open('c:/Gasprices.txt','r') as f:
fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[0],x.split('-')[-1].split(':')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)
输出
[('04', '1993', '1.068'), ('04', '1993', '1.079'), ('04', '1993', '1.079'), ('06', '1994', '1.065'), ('06', '1994', '1.073'), ('06', '1994', '1.079')]
或
with open('c:/Gasprices.txt','r') as f:
fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[-1].split(':')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)
输出
[('1993', '1.068'), ('1993', '1.079'), ('1993', '1.079'), ('1994', '1.065'), ('1994', '1.073'), ('1994', '1.079')]
如前所述,要获得年份,您应该使用更复杂的拆分。但是您的格式似乎非常一致,您可能会选择:
datesprices=[(x[6:10], x[11:]) for x in fullfile]
但是如何求平均值呢?您需要在某处存储特定年份的列表。
from statistics import mean
my_dict = {} # could be defaultdict too
for year, price in datesprices:
if year not in my_dict:
my_dict[year] = []
my_dict[year].append(price)
for year, prices in my_dict.items():
print(year, mean(prices))
txt = ['04-05-1993:1.068', '04-12-1993:1.079', '04-19-1993:1.079', '06-06-1994:1.065', '06-13-1994:1.073', '06-20-1994:1.079']
price_per_year = {}
number_of_years = {}
for i in txt:
x = txt.split(':')
Date = x[0]
Price = x[1]
year = date.split('-')[2]
if year ~in price_per_year.keys:
price_per_year.update({year:Price})
number_of_years.update({year:1})
else:
price_per_year[year] += Price
number_of_years[year] += 1
av_price_1993 = price_per_year[1993] / number_of_years[1993]
av_price_1994
= price_per_year[1994] / number_of_years[1994]
我认为没有必要拆分输入行,因为它们具有固定的日期格式 - 即,它的长度是已知的。因此我们可以切片。
with open('gas.txt') as gas:
td = dict()
for line in gas:
year = line[6:10]
price = float(line[11:])
td.setdefault(year, []).append(price)
for k, v in td.items():
print(f'{k} {sum(v)/len(v):.3f}')
输出:
1993 1.075
1994 1.072
注:
这里没有检查空行。假设有 none 并且问题中显示的示例格式错误。
此外,无需删除传入行,因为 float() 不受 leading/trailing 空格
的影响