Python - 从文本文件中读取并合并信息

Question

我有一个文本文件中的数据，我可以读取它，但需要删除重复的名称并将值串在一起。见下文：

boris:1
boris:3
boris:8
tim:4
tim:5
tim:2
ella:3
ella:9
ella:6

我需要删除重复的名称并将值添加到一行中，例如：

boris:1:3:8
tim:4:5:2
ella:3:9:6

到目前为止我尝试过的所有内容要么显示所有具有重复名称的值，要么只显示最后一个条目。尝试过的方法如下：

file = open ("text1.txt", 'r')
for line in file:
values = line.strip().split(":")
name = values[0]
print(values[0], values[1]) #for checking to see values held
    for index, item in enumerate(line):
        for num in range(3):
            val = {}
            if index ==0:
                name = item
            if index == 1:
                scr1 = item
            val[str(num)] = name + str(scr1)
        print(num)
print(name, scr1)

我也试过：

for line in file.readlines():
line = line.split(":")
#print(line)
for n, item in enumerate(line):
    #print(n, line1)
    if n == 0:
        name = item
        #print(name)
        if item.startswith(name):
            line[n] = item.rstrip() # i'm sure that here is where i'm going wrong but don't know how to solve
        #else:
            #line[n] = item.rstrip()
print(":".join(line))
#print(line)

虽然这些在某种程度上起作用，但我无法得到我正在寻找的答案 - 非常感谢任何帮助。结果最终看起来像这样：

boris:1
boris:3
boris:8
tim:4
tim:5
tim:2
ella:3
ella:9
ella:6

这是我开始的地方。

Answer 1

您需要将整个数据集存储在内存中（实际上，如果您有非常大的数据集，则可以避免这样做，但实现起来会比较困难）。您需要创建一个 dict 来在其中存储值。当你遇到新的名字时，你会创建新的字典项，当你遇到已经存在的名字时，你会把它的值附加到相应的字典项中。

这是一个示例代码：

dataset = dict()
# first, if we use `with` then file will be closed automatically
with open('text1.txt', 'r') as f:
    # when we want to just iterate over file lines, we can omit `readlines` and use this simple syntax
    for line in f:
        # strip() is important, because each line we read ends with '\n' character - and we want to strip it off.
        # partition() returns tuple of left-part, separator and right-part,
        # but we don't need that separator value so we assign it to a dummy variable.
        # rpartition() is better choice if name may contain ':' character in it.
        name, _, value = line.strip().rpartition(':')
        if name not in dataset:  # newly encountered name?
            # here we create a new `list` holding our value
            dataset[name] = [value]
        else:
            # append the value to existing list
            dataset[name].append(value)

# now print out resulting data
for name, values in dataset.items():
    print(':'.join([name] + values))

如果您需要保留原始名称顺序，只需将 dict 替换为 collections 模块中的 OrderedDict。

对最后一部分发生的事情的一点解释：我们遍历对 (name, values)。然后，对于每一对，我们创建一个仅包含 name 的列表，将该列表与 values 列表连接起来，然后使用 : 作为分隔符加入结果列表，并将其打印出来。

Answer 2

您需要一个临时数据结构，在遍历文件时填充该数据结构然后打印。

names = {}
with open("text1.txt", 'r') as file:
    for line in file:
        name, value = line.split(":")
        if name not in names:
            names[name] = []
        names[name].append(value.rstrip())
for name, values in names.items():
    print(name + ":" + ":".join(values))

编辑：太慢了:D

Python - 从文本文件中读取并合并信息

Python - Read from text file and amalgamate information

enumerate

startswith

readlines

python-3.x