如何在 python 中使用正则表达式从字符串中提取特定单词

how to extract specific word from string using regex in python

我有两个包含单词的字符串,它们的类型是:

text1 = 'Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP'
text2 = 'Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN'

我喜欢将任何带有 /NN 标签的单词提取为带有 /NNP/CDP 标签的单词。到目前为止,这是我的代码(仍然只适用于 /NNP 标签):

import re

def entityExtractPreposition(text):
    text = re.findall(r'([^\s/]*/IN\b[^/]*(?:/(?!IN\b)[^/]*)*/NNP\b)', text)
    return text

text1 = 'Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP'
prepo1 = entityExtractPreposition(text1)

text2 = 'Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN'
prepo2 = entityExtractPreposition(text2)

print text1
print prepo1
print ''
print text2
print prepo2

到目前为止的代码结果:

Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP
['at/IN Yasmin/NNP']

Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN
['at/IN Jl/NNP Halimun/NNP Raya/NNP']

正如我们看到的第一个字符串 (text1),entityExtractPreposition 仍然无法获得 33/CDP。如何使用 text1 中的 /CDP 标记或 text2 中的 /NNP 使 entityExtractPreposition 正常工作?

预期结果是:

Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP
['at/IN Yasmin/NNP 33/CDP']

Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN
['at/IN Jl/NNP Halimun/NNP Raya/NNP']

谢谢

\b[^\s/]+/IN\b(?:(?!/IN\b).)*/(?:NNP|CDP)\b