使用短语中的信息查找句子中单词的索引

find index of word in sentence with information from phrase

我需要 sentenceword 的索引。但有时会有重复的话。 phrase 信息会很有帮助。或者 word 列中的上一行或下一行。

基本上,我只需要识别话语中的单词,例如如果 word 是 'seaside',我想知道它在句子中是哪个 'seaside'。我有来自 phrase 的额外信息,可以帮助我进行身份验证。它们在数据框中的出现顺序也有帮助。

我现在有的是:

file_id phrase word sentence word_indices
A I am I I am a happy bird. I sing every day. I eat worms. [0, 5, 9]
B the seaside is the she is by the seaside. The seaside is packed. [3, 5]
B the seaside is seaside she is by the seaside. The seaside is packed. [4, 6]
B the seaside is is she is by the seaside. The seaside is packed. [1, 7]
C nobody knows nobody nobody knows what is going on. She can find nobody [0, 9]
C find nobody nobody nobody knows what is going on. She can find nobody [0, 9]
D it is such a sunny day sunny it is such a sunny day ah I am so happy when it's sunny such a sunny day is the best [4, 13, 16]

但我想得到的是target列中的内容。

file_id phrase word sentence word_indices target
A I am I I am a happy bird. I sing every day. I eat worms. [0, 5, 9] [0]
B the seaside is the she is by the seaside. The seaside is packed. [3, 5] [5]
B the seaside is seaside she is by the seaside. The seaside is packed. [4, 6] [6]
B the seaside is is she is by the seaside. The seaside is packed. [1, 7] [7]
C nobody knows nobody nobody knows what is going on. She can find nobody [0, 9] [0]
C find nobody nobody nobody knows what is going on. She can find nobody [0, 9] [9]
D it is such a sunny day sunny it is such a sunny day ah I am so happy when it's sunny such a sunny day is the best [4, 13, 16] [4]

我在这里发现了一个类似的问题: 但不幸的是,这是在 java 中,我需要使用 python.

的答案

非常感谢!

我会把它分成两步。找出句子中导致该短语的单词数,然后找到该短语中单词的单词索引号:如下所示:

def get_index_of_word_in_sentence(word, phrase, sentence):
    index1 = sentence.index(phrase)
    word_num1 = len(sentence[:index1].split())
    word_num2 = phrase.split().index(word)
    return word_num1 + word_num2

df["target"] = df.apply(lambda x: get_index_of_word_in_sentence(x["word"], x["phrase"], x["sentence"]), axis=1)