Python 文件读取问题，可能是文件循环？

Question

问题如下； "编写一个 Python 程序来读取一个包含湖泊和鱼类数据的文件并设置报告表格格式的湖泊标识号、湖泊名称和鱼重（使用带格式的字符串区域）。该程序应计算鱼的平均重量报道。

湖泊识别；

1000 Chemo
1100 Greene
1200 Toddy

我必须阅读的文件"FishWeights.txt"包含以下数据；

我的代码；

f = open("fishweights.txt")
print(f.read(4), "Chemo", f.readline(4))
print(f.read(5), "Greene", f.read(5))
print(f.read(4), "Toddy", f.read(5))
print(f.read(5), "Chemo", f.read(4))
print(f.read(5), "Chemo", f.read(4))
print(f.read(5), "Greene", f.read(4))
print(f.read(5), "Toddy", f.read(4))

我收到的输出是；

1000 Chemo  4.0

1100 Greene  2.0

1200 Toddy  1.5

1000  Chemo 2.0

1000  Chemo 2.2

1100  Greene 1.9

1200  Toddy 2.8

这是正确的，因为我必须显示湖泊的 ID 号、名称和每个湖泊的鱼重。但我需要能够进行计算，最终计算出所有鱼的平均重量。输出的格式应该整齐，如下所示；

1000     Chemo      4.0
1100     Greene     2.0
1200     Toddy      1.5
1000     Chemo      2.0
1000     Chemo      2.2
1100     Greene     1.9
1200     Toddy      2.8
The average fish weight is: 2.34

感谢任何帮助，这里只是一个初学者，寻求帮助以全面了解该主题。谢谢！

Answer 1

是的，您需要遍历行。这是您正在寻找的结构：

with open("fishweights.txt") as fo:
    for line in fo:
        pass

现在为了检索每一行的每一部分，您可以使用 line.split()。假设 ids 的长度是固定的，读取固定数量的字节（就像你所做的那样）是好的。你确定每个 id 总是恰好有 4 位数字吗？这样的东西可能会更好：

raw_data = []
with open("fishweights.txt") as fo:
    for line in fo:
        row = line.strip().split()
        if not row:
            continue  # ignore empty lines
        id = int(row[0])
        no = float(row[1])
        raw_data.append((id, no))

现在您已经有了原始数据，您需要对其进行聚合：

sum = 0
count = 0
for id, no in raw_data:
    sum += no
    count += 1
avg = sum / count

或单行

avg = sum(no for id, no in raw_data) / len(raw_data)

最后，您需要将 ID 映射到最终打印的名称：

id_to_name = {
    1000: 'Chemo',
    1100: 'Greene',
    1200: 'Toddy',
}
for id, no in raw_data:
    print(id, id_to_name[id], no)
print('Average: ', avg)

当然三个循环可以合并为一个循环。我把它分开了，这样你就可以清楚地看到代码的每个阶段。最终（经过一些优化）的结果可能如下所示：

id_to_name = {
    1000: 'Chemo',
    1100: 'Greene',
    1200: 'Toddy',
}
sum = 0
count = 0
with open("fishweights.txt") as fo:
    for line in fo:
        row = line.strip().split()
        if not row:
            continue  # ignore empty lines
        id = int(row[0])
        no = float(row[1])
        sum += no
        count += 1
        print(id, id_to_name[id], no)
print('Average:', sum/count)

Answer 2

您不需要使用偏移量来读取行。此外，您可以使用 with 来确保文件在您完成后关闭。对于平均值，您可以将所有数字放在一个列表中，然后在最后找到平均值。使用字典将湖泊 ID 映射到名称：

lakes = {
    1000: "Chemo",
    1100: "Greene",
    1200: "Toddy"
}
allWeights = []

with open("test.txt", "r") as f:
    for line in f:
        line = line.strip()  # get rid of any whitespace at the end of the line
        line = line.split()

        lake, weight = line
        lake = int(lake)
        weight = float(weight)
        print(lake, lakes[lake], weight, sep="\t")
        allWeights.append(weight)

avg = sum(allWeights) / len(allWeights)
print("The average fish weight is: {0:.2f}".format(avg)) # format to 2 decimal places

输出：

1000    Chemo   4.0
1100    Greene  2.0
1200    Toddy   1.5
1000    Chemo   2.0
1000    Chemo   2.2
1100    Greene  1.9
1200    Toddy   2.8
The average fish weight is: 2.34

有更有效的方法来执行此操作，但这可能是帮助您了解正在发生的事情的最简单方法。

Answer 3

您可以将湖泊名称存储到字典中，将数据存储到列表中。在这个例子中，您只需从那里循环遍历您的列表 fish 并获取与 id 对应的湖泊名称。最后通过将列表中的 weight 相加并将其除以 fish.

的长度来打印您的平均值

with open('LakeID.txt','r') as l:
    lake = l.readlines()
    lake = dict([i.rstrip('\n').split() for i in lake])

with open('FishWeights.txt','r') as f:
    fish = f.readlines()
    fish = [i.rstrip('\n').split() for i in fish]

for i in fish:
    print(i[0],lake[i[0]],i[1])    

print('The total average is {}'.format(sum(float(i[1]) for i in fish)/len(fish)))

我们还鼓励您使用 with open(..) 上下文管理器来确保文件在退出时关闭。

Answer 4

所以在这里您可以将鱼的重量和湖泊数据存储在两个数组中。请参阅以下内容，它读取每一行，然后将它们拆分为鱼重列表和湖泊数据列表。

text=f.readlines()
fishWeights=[] 
lakeData=[]
for item in text:
    fishWeights.append(item.split(' ')[1])
    lakeData.append(item.split(' ')[1])

从这里你可以用

输出信息

for i in range(len(fishWeights)) :
    print(lakeData[i], "Your Text", fishWeights[i])

你可以用

算出你的平均值

total=0
for weight in fishWeights:
    total+=weight
total/=len(fishWeights)

Answer 5

使用dataframe可以轻松实现。请在下面找到示例代码。

import pandas as pd

# load lake data into a dataframe
lakeDF = pd.read_csv('Lake.txt', sep=" ", header=None)
lakeDF.columns = ["Lake ID", "Lake Name"]
#load fish data into a dataframe
fishWeightDF = pd.read_csv('FishWeights.txt', sep=" ", header=None)
fishWeightDF.columns = ["Lake ID", "Fish Weight"]
#sort fishweight with 'Lake ID' (common field in both lake and fish)
fishWeightDF = fishWeightDF.sort_values(by= ['Lake ID'],ascending=True)
# join fish with lake
mergedFrame = pd.merge_asof(
    fishWeightDF, lakeDF,
    on='Lake ID'
    )
#print the result
print(mergedFrame)
#find the average
average = mergedFrame['Fish Weight'].mean()
print(average)

Python 文件读取问题，可能是文件循环？

Python File reading problem, Possible infile loop?

python

loops

file

readline

python-3.x