使用 Python 将变量字符串附加到文件中的每一行
Appending variable string to each line in a file with Python
我对 Python 中的文件 r/w 几乎没有任何经验,想问一下对我的特殊情况最好的解决方案是什么。
我有一个制表符分隔的文件,其结构如下,其中每个句子由一个空行分隔:
Roundup NN
: :
Muslim NNP
Brotherhood NNP
vows VBZ
daily JJ
protests NNS
in IN
Egypt NNP
Families NNS
with IN
no DT
information NN
on IN
the DT
whereabouts NN
of IN
loved VBN
ones NNS
are VBP
grief JJ
- :
stricken JJ
. .
The DT
provincial JJ
departments NNS
of IN
supervision NN
and CC
environmental JJ
protection NN
jointly RB
announced VBN
on IN
May NNP
9 CD
that IN
the DT
supervisory JJ
department NN
will MD
question VB
and CC
criticize VB
mayors NNS
who WP
fail VBP
to TO
curb VB
pollution NN
. .
(...)
我想追加到这个文件的非空行,首先是一个制表符,然后是一个给定的字符串。
对于每一行,要附加的字符串将取决于下面代码中 lab_pred_tags
中存储的值。对于 for
循环的每次迭代,lab_pred_tags
的长度与文本文件中相应句子的行数相同。即,在示例中,3 for
循环迭代的 lab_pred_tags
的长度为 9、15 和 12。
对于第一个 for
循环迭代,lab_pred_tags
包含 list
:['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'B-GPE']
# (...) code to calculate lab_pred
for lab, lab_pred, length in zip(labels, labels_pred, sequence_lengths):
lab = lab[:length]
lab_pred = lab_pred[:length]
# Convert lab_pred from a sequence of numbers to a sequence of strings
lab_pred_tags = d_u.label_idxs_to_tags(lab_pred, tags)
# Now what is the best solution to append each element of `lab_pred_tags` to each line in the file?
# Keep in mind that I will need to skip a line everytime a new for loop iteration is started
例如,所需的输出文件是:
Roundup NN O
: : O
Muslim NNP B-ORG
Brotherhood NNP I-ORG
vows VBZ O
daily JJ O
protests NNS O
in IN O
Egypt NNP B-GPE
Families NNS O
with IN O
no DT O
information NN O
on IN O
the DT O
whereabouts NN O
of IN O
loved VBN O
ones NNS O
are VBP O
grief JJ O
- : O
stricken JJ O
. . O
The DT O
provincial JJ O
departments NNS O
of IN O
supervision NN O
and CC O
environmental JJ O
protection NN O
jointly RB O
announced VBN O
on IN O
May NNP O
9 CD O
that IN O
the DT O
supervisory JJ O
department NN O
will MD O
question VB O
and CC O
criticize VB O
mayors NNS O
who WP O
fail VBP O
to TO O
curb VB O
pollution NN O
. . O
最好的解决方案是什么?
出于测试目的,我修改了 lab_pred_tags 列表。这是我的解决方案:
lab_pred_tags = ['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O',
'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O',
'O', 'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O',
'O', 'O', 'B-GPE', 'O']
index = 0
with open("PATH_TO_YOUR_FILE", "r") as lab_file, \
open("PATH_TO_NEW_FILE", "w") as lab_file_2:
lab_file_list = lab_file.readlines()
for lab_file_list_element in lab_file_list:
if lab_file_list_element == "\n":
index = 0
lab_file_2.write("\n")
else:
new_line_element = lab_file_list_element.replace(
"\n", ' ' + lab_pred_tags[index] + "\n"
)
index += 1
lab_file_2.write(new_line_element)
我对 Python 中的文件 r/w 几乎没有任何经验,想问一下对我的特殊情况最好的解决方案是什么。
我有一个制表符分隔的文件,其结构如下,其中每个句子由一个空行分隔:
Roundup NN
: :
Muslim NNP
Brotherhood NNP
vows VBZ
daily JJ
protests NNS
in IN
Egypt NNP
Families NNS
with IN
no DT
information NN
on IN
the DT
whereabouts NN
of IN
loved VBN
ones NNS
are VBP
grief JJ
- :
stricken JJ
. .
The DT
provincial JJ
departments NNS
of IN
supervision NN
and CC
environmental JJ
protection NN
jointly RB
announced VBN
on IN
May NNP
9 CD
that IN
the DT
supervisory JJ
department NN
will MD
question VB
and CC
criticize VB
mayors NNS
who WP
fail VBP
to TO
curb VB
pollution NN
. .
(...)
我想追加到这个文件的非空行,首先是一个制表符,然后是一个给定的字符串。
对于每一行,要附加的字符串将取决于下面代码中 lab_pred_tags
中存储的值。对于 for
循环的每次迭代,lab_pred_tags
的长度与文本文件中相应句子的行数相同。即,在示例中,3 for
循环迭代的 lab_pred_tags
的长度为 9、15 和 12。
对于第一个 for
循环迭代,lab_pred_tags
包含 list
:['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'B-GPE']
# (...) code to calculate lab_pred
for lab, lab_pred, length in zip(labels, labels_pred, sequence_lengths):
lab = lab[:length]
lab_pred = lab_pred[:length]
# Convert lab_pred from a sequence of numbers to a sequence of strings
lab_pred_tags = d_u.label_idxs_to_tags(lab_pred, tags)
# Now what is the best solution to append each element of `lab_pred_tags` to each line in the file?
# Keep in mind that I will need to skip a line everytime a new for loop iteration is started
例如,所需的输出文件是:
Roundup NN O
: : O
Muslim NNP B-ORG
Brotherhood NNP I-ORG
vows VBZ O
daily JJ O
protests NNS O
in IN O
Egypt NNP B-GPE
Families NNS O
with IN O
no DT O
information NN O
on IN O
the DT O
whereabouts NN O
of IN O
loved VBN O
ones NNS O
are VBP O
grief JJ O
- : O
stricken JJ O
. . O
The DT O
provincial JJ O
departments NNS O
of IN O
supervision NN O
and CC O
environmental JJ O
protection NN O
jointly RB O
announced VBN O
on IN O
May NNP O
9 CD O
that IN O
the DT O
supervisory JJ O
department NN O
will MD O
question VB O
and CC O
criticize VB O
mayors NNS O
who WP O
fail VBP O
to TO O
curb VB O
pollution NN O
. . O
最好的解决方案是什么?
出于测试目的,我修改了 lab_pred_tags 列表。这是我的解决方案:
lab_pred_tags = ['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O',
'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O',
'O', 'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O',
'O', 'O', 'B-GPE', 'O']
index = 0
with open("PATH_TO_YOUR_FILE", "r") as lab_file, \
open("PATH_TO_NEW_FILE", "w") as lab_file_2:
lab_file_list = lab_file.readlines()
for lab_file_list_element in lab_file_list:
if lab_file_list_element == "\n":
index = 0
lab_file_2.write("\n")
else:
new_line_element = lab_file_list_element.replace(
"\n", ' ' + lab_pred_tags[index] + "\n"
)
index += 1
lab_file_2.write(new_line_element)