Python3 如何有条件地逐行合并 2 个文本文件

Question

我在包含星星信息的文本文件中有两个 ascii tables，其中一个带有 headers

|编号 |时间 |震级 |错误 |

另一个有headers

|编号 | CLASS |

我想将 CLASS 列添加到第一个文本文件中。这里的主要问题是第一个文本文件每个星都有很多行（IE Star 3_6588 在 table a 中有 20 个条目用于不同的时间）而第二个文本文件只有一个条目每个 ID（因为 Star 3_6588 始终是 Class I）。

我需要做的是添加 |CLASS|列到第一个 table，其中特定 ID 的每个实例都具有所需的 class。文本文件中有超过 1400 万行，这就是为什么我不能手动执行此操作的原因。

Answer 1

听起来你应该使用 csv 模块将 ID|CLASS 文件读入字典，然后逐行遍历第一个文件，查找 CLASS使用 ID 值，并将结果 "row" 输出到新文件。

Answer 2

@Terry Spotts 的想法是正确的。然而，header 行中的前导和尾随 | 字符会使这成为一个稍微棘手的 CSV，因为分隔符是管道字符，但有时前导 space，尾随 space，或两者兼而有之。但是这里有一个生成 ID: Class 字典的例子：

> cat bigfile.txt
| ID | TIME | MAGNITUDE | ERROR |
| Star 3_6588 | 10 | 2 | 1.02 |
| Star 3_6588 | 15 | 4 | 1.2 |
| Star 2_999 | 20 | 6 | 1.4 |
| Star 2_999 | 25 | 8 | 1.6 |

> cat smallfile.txt
| ID | CLASS |
| Star 3_6588 | CLASS I |

代码：

id2class = {}
with open('/tmp/smallfile.txt', 'r') as classfile:
    line = classfile.readline()        # Skip Header Line
    for line in classfile:
        line = line.rstrip('\n')[2:-2] # strip newline and the Pipe-Space / Space-Pipe and the start + end
        fields = line.split(' | ')     # Split on ' | '
        id = fields[0]
        starclass = fields[1]
        id2class[id] = starclass

现在你的字典 id2class 看起来像：

{
    'Star 3_6588': 'CLASS I',
    'Star 2_999': 'CLASS II'
}

然后你可以用类似的方式解析第一个文件，使用每行的ID在dict中查找Class，并将该行的完整数据写出到一个新文件中。我会把那部分留给你:)

编码愉快！

Python3 如何有条件地逐行合并 2 个文本文件

Python3 how to combine 2 text files line by line conditionally

data-files

python-3.x