使用列表以块为单位进行数据解析

Data parsing in block units using list

输入:

ID   information1
Aa   information1-1
Ba   information1-2
Ca   Homo sapiens
Da   information1-4
//
ID   information2
Aa   information2-1
Ba   information2-2
Ca   information2-3
Da   information2-4
//

预期输出:

ID   information1
Aa   information1-1
Ba   information1-2
Ca   Homo sapiens
Da   information1-4
//

结果:

ID   information1
ID   information1
Aa   information1-1
ID   information1
Aa   information1-1
Ba   information1-2
ID   information1
Aa   information1-1
Ba   information1-2
Ca   Homo sapiens
ID   information1
Aa   information1-1
Ba   information1-2
Ca   Homo sapiens
Da   information1-4
ID   information1
Aa   information1-1
Ba   information1-2
Ca   Homo sapiens
Da   information1-4
//

结果:

代码:

word = 'Homo sapiens'
with open(input_file, 'r') as input, open(output_file, 'w') as output:

    list_block = []
    str_block = ""

    for line in input:

        if not ("//" in line):
            str_block += line

        elif "//" in line:
            if word in str_block:
                list_block.append(str_block)
            str_block = ""

        output.write(str_block)

我有一个输入文件,其中包含基于 'double slash' 的信息块。我只想从几个块中提取包含 'Homo sapiens' 的块。当我尝试用我的代码解析数据时,我遇到了类似 'Result' 的问题。有什么方法可以处理我的代码吗?

由于您的块由“//”分隔,因此读取整个文件然后根据此模式拆分文件会容易得多。这将创建您需要的块列表,之后解决方案非常简单。这是一个产生所需输出的示例。

word = 'Homo sapiens'

with open(input_file, 'r') as fi, open(output_file, 'w') as fo:

    for block in fi.read().split('//'):  # read file, split in blocks and iterate over them

        if word in block:

            fo.write(block)
            fo.write('//')