如何拆分文本文件

How to split the textfile

04-05-1993:1.068

04-12-1993:1.079

04-19-1993:1.079

06-06-1994:1.065

06-13-1994:1.073

06-20-1994:1.079


我有天然气日期-年份-价格的文本文件,我想计算一年的平均天然气价格。所以我尝试拆分,

with open('c:/Gasprices.txt','r') as f: 
   fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)

但我无法获取年份和价格数据,而是这样的数据。

('04', '1.068'), ('04', '1.079')

请让我知道我应该知道什么。

另外,如果可以的话,请告诉我如何使用拆分数据通过字典计算每年的平均价格。

试试这个


with open('c:/Gasprices.txt','r') as f: 
    fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[0],x.split('-')[-1].split(':')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)

输出

[('04', '1993', '1.068'), ('04', '1993', '1.079'), ('04', '1993', '1.079'), ('06', '1994', '1.065'), ('06', '1994', '1.073'), ('06', '1994', '1.079')]

with open('c:/Gasprices.txt','r') as f: 
    fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[-1].split(':')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)

输出

[('1993', '1.068'), ('1993', '1.079'), ('1993', '1.079'), ('1994', '1.065'), ('1994', '1.073'), ('1994', '1.079')]

如前所述,要获得年份,您应该使用更复杂的拆分。但是您的格式似乎非常一致,您可能会选择:

datesprices=[(x[6:10], x[11:]) for x in fullfile]

但是如何求平均值呢?您需要在某处存储特定年份的列表。

from statistics import mean

my_dict = {} # could be defaultdict too
for year, price in datesprices:
    if year not in my_dict:
        my_dict[year] = []
    my_dict[year].append(price)

for year, prices in my_dict.items():
    print(year, mean(prices))
    txt = ['04-05-1993:1.068', '04-12-1993:1.079', '04-19-1993:1.079', '06-06-1994:1.065', '06-13-1994:1.073', '06-20-1994:1.079']

    price_per_year = {}
    number_of_years = {}
    for i in txt:
      x = txt.split(':')
      Date = x[0]
      Price = x[1]
      year = date.split('-')[2]

      if year ~in price_per_year.keys:
        price_per_year.update({year:Price})
        number_of_years.update({year:1})
      else:
        price_per_year[year] += Price
        number_of_years[year] += 1
 
av_price_1993 = price_per_year[1993] / number_of_years[1993]
av_price_1994
 = price_per_year[1994] / number_of_years[1994]

我认为没有必要拆分输入行,因为它们具有固定的日期格式 - 即,它的长度是已知的。因此我们可以切片。

with open('gas.txt') as gas:
    td = dict()
    for line in gas:
        year = line[6:10]
        price = float(line[11:])
        td.setdefault(year, []).append(price)
    for k, v in td.items():
        print(f'{k} {sum(v)/len(v):.3f}')

输出:

1993 1.075
1994 1.072

注:

这里没有检查空行。假设有 none 并且问题中显示的示例格式错误。

此外,无需删除传入行,因为 float() 不受 leading/trailing 空格

的影响