使用 Python 将变量字符串附加到文件中的每一行

Appending variable string to each line in a file with Python

我对 Python 中的文件 r/w 几乎没有任何经验,想问一下对我的特殊情况最好的解决方案是什么。

我有一个制表符分隔的文件,其结构如下,其中每个句子由一个空行分隔:

Roundup NN
:   :
Muslim  NNP
Brotherhood NNP
vows    VBZ
daily   JJ
protests    NNS
in  IN
Egypt   NNP

Families    NNS
with    IN
no  DT
information NN
on  IN
the DT
whereabouts NN
of  IN
loved   VBN
ones    NNS
are VBP
grief   JJ
-   :
stricken    JJ
.   .

The DT
provincial  JJ
departments NNS
of  IN
supervision NN
and CC
environmental   JJ
protection  NN
jointly RB
announced   VBN
on  IN
May NNP
9   CD
that    IN
the DT
supervisory JJ
department  NN
will    MD
question    VB
and CC
criticize   VB
mayors  NNS
who WP
fail    VBP
to  TO
curb    VB
pollution   NN
.   .

(...)

我想追加到这个文件的非空行,首先是一个制表符,然后是一个给定的字符串。

对于每一行,要附加的字符串将取决于下面代码中 lab_pred_tags 中存储的值。对于 for 循环的每次迭代,lab_pred_tags 的长度与文本文件中相应句子的行数相同。即,在示例中,3 for 循环迭代的 lab_pred_tags 的长度为 9、15 和 12。

对于第一个 for 循环迭代,lab_pred_tags 包含 list['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'B-GPE']

# (...) code to calculate lab_pred
for lab, lab_pred, length in zip(labels, labels_pred, sequence_lengths):
    lab = lab[:length]
    lab_pred = lab_pred[:length]
    # Convert lab_pred from a sequence of numbers to a sequence of strings
    lab_pred_tags = d_u.label_idxs_to_tags(lab_pred, tags)
    # Now what is the best solution to append each element of `lab_pred_tags` to each line in the file?
    # Keep in mind that I will need to skip a line everytime a new for loop iteration is started

例如,所需的输出文件是:

Roundup NN  O
:   :   O
Muslim  NNP B-ORG
Brotherhood NNP I-ORG
vows    VBZ O
daily   JJ  O
protests    NNS O
in  IN  O
Egypt   NNP B-GPE

Families    NNS O
with    IN  O
no  DT  O
information NN  O
on  IN  O
the DT  O
whereabouts NN  O
of  IN  O
loved   VBN O
ones    NNS O
are VBP O
grief   JJ  O
-   :   O
stricken    JJ  O
.   .   O

The DT  O
provincial  JJ  O
departments NNS O
of  IN  O
supervision NN  O
and CC  O
environmental   JJ  O
protection  NN  O
jointly RB  O
announced   VBN O
on  IN  O
May NNP O
9   CD  O
that    IN  O
the DT  O
supervisory JJ  O
department  NN  O
will    MD  O
question    VB  O
and CC  O
criticize   VB  O
mayors  NNS O
who WP  O
fail    VBP O
to  TO  O
curb    VB  O
pollution   NN  O
.   .   O

最好的解决方案是什么?

出于测试目的,我修改了 lab_pred_tags 列表。这是我的解决方案:

    lab_pred_tags = ['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O',
                     'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O',
                     'O', 'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O',
                     'O', 'O', 'B-GPE', 'O']
    index = 0

    with open("PATH_TO_YOUR_FILE", "r") as lab_file, \
            open("PATH_TO_NEW_FILE", "w") as lab_file_2:
        lab_file_list = lab_file.readlines()

        for lab_file_list_element in lab_file_list:
            if lab_file_list_element == "\n":
                index = 0
                lab_file_2.write("\n")
            else:
                new_line_element = lab_file_list_element.replace(
                    "\n", ' ' + lab_pred_tags[index] + "\n"
                )
                index += 1
                lab_file_2.write(new_line_element)