我可以将 3 克 txt 转换为 iob 以用于 crf 套件吗
can i convert 3-gram txt to iob for crf suite
txt 是这样的 3-grams 格式:
None,None,kgo,gop,ope,Test_Sepedi
None,kgo,gop,ope,pel,Test_Sepedi
kgo,gop,ope,pel,elo,Test_Sepedi
gop,ope,pel,elo,None,Test_Sepedi
ope,pel,elo,None,None,Test_Sepedi
None,None,gag,ago,None,Test_Sepedi
None,gag,ago,None,None,Test_Sepedi
None,None,gan,ann,nnw,Test_Sepedi
None,gan,ann,nnw,nwe,Test_Sepedi
gan,ann,nnw,nwe,None,Test_Sepedi
ann,nnw,nwe,None,None,Test_Sepedi
None,None,tla,None,None,Test_Sepedi
我希望它采用 crfsuite 用于训练的格式,例如:
London JJ B-NP
shares NNS I-NP
closed VBD B-VP
moderately RB B-ADVP
lower JJR I-ADVP
in IN B-PP
thin JJ B-NP
trading NN I-NP
如果我可以使用 python 转换它,将不胜感激
看不到你想做什么
我只是告诉你我的想法
out_file = open('./out', 'w')
for line in open('./in'):
#do what ever you want to with input
#and write output to output file
out_file.write(result+'\n')
out_file.close()
希望对您有所帮助
从问题的外观来看,
我假设输入文件是 csv 格式,而 IOB2 格式看起来好像是 space 或制表符分隔的标记。因此,实现该格式的最简单方法是读取每一行并将逗号分隔符替换为 space.
# fill in your paths here, do not copy and paste
output = open(OUTFILE_PATH, 'w')
input = open(INPUT_PATH,'r')
data = input.readlines()
input.close()
for line in data:
output_line = line.replace("\n","")
# if the format requires a space then replace with a space
# if the format requires a tab then replace with a tab
# since your file seems to be comma separated,
#that is why I replace the comma below with a space
output_line = output_line.replace(","," ")
out_file.write(output_line+'\n')
out_file.close()
希望对您有所帮助!
txt 是这样的 3-grams 格式:
None,None,kgo,gop,ope,Test_Sepedi
None,kgo,gop,ope,pel,Test_Sepedi
kgo,gop,ope,pel,elo,Test_Sepedi
gop,ope,pel,elo,None,Test_Sepedi
ope,pel,elo,None,None,Test_Sepedi
None,None,gag,ago,None,Test_Sepedi
None,gag,ago,None,None,Test_Sepedi
None,None,gan,ann,nnw,Test_Sepedi
None,gan,ann,nnw,nwe,Test_Sepedi
gan,ann,nnw,nwe,None,Test_Sepedi
ann,nnw,nwe,None,None,Test_Sepedi
None,None,tla,None,None,Test_Sepedi
我希望它采用 crfsuite 用于训练的格式,例如:
London JJ B-NP
shares NNS I-NP
closed VBD B-VP
moderately RB B-ADVP
lower JJR I-ADVP
in IN B-PP
thin JJ B-NP
trading NN I-NP
如果我可以使用 python 转换它,将不胜感激
看不到你想做什么 我只是告诉你我的想法
out_file = open('./out', 'w')
for line in open('./in'):
#do what ever you want to with input
#and write output to output file
out_file.write(result+'\n')
out_file.close()
希望对您有所帮助
从问题的外观来看, 我假设输入文件是 csv 格式,而 IOB2 格式看起来好像是 space 或制表符分隔的标记。因此,实现该格式的最简单方法是读取每一行并将逗号分隔符替换为 space.
# fill in your paths here, do not copy and paste
output = open(OUTFILE_PATH, 'w')
input = open(INPUT_PATH,'r')
data = input.readlines()
input.close()
for line in data:
output_line = line.replace("\n","")
# if the format requires a space then replace with a space
# if the format requires a tab then replace with a tab
# since your file seems to be comma separated,
#that is why I replace the comma below with a space
output_line = output_line.replace(","," ")
out_file.write(output_line+'\n')
out_file.close()
希望对您有所帮助!