使用短语中的信息查找句子中单词的索引
find index of word in sentence with information from phrase
我需要 sentence
中 word
的索引。但有时会有重复的话。 phrase
信息会很有帮助。或者 word
列中的上一行或下一行。
基本上,我只需要识别话语中的单词,例如如果 word
是 'seaside',我想知道它在句子中是哪个 'seaside'。我有来自 phrase
的额外信息,可以帮助我进行身份验证。它们在数据框中的出现顺序也有帮助。
我现在有的是:
file_id
phrase
word
sentence
word_indices
A
I am
I
I am a happy bird. I sing every day. I eat worms.
[0, 5, 9]
B
the seaside is
the
she is by the seaside. The seaside is packed.
[3, 5]
B
the seaside is
seaside
she is by the seaside. The seaside is packed.
[4, 6]
B
the seaside is
is
she is by the seaside. The seaside is packed.
[1, 7]
C
nobody knows
nobody
nobody knows what is going on. She can find nobody
[0, 9]
C
find nobody
nobody
nobody knows what is going on. She can find nobody
[0, 9]
D
it is such a sunny day
sunny
it is such a sunny day ah I am so happy when it's sunny such a sunny day is the best
[4, 13, 16]
但我想得到的是target
列中的内容。
file_id
phrase
word
sentence
word_indices
target
A
I am
I
I am a happy bird. I sing every day. I eat worms.
[0, 5, 9]
[0]
B
the seaside is
the
she is by the seaside. The seaside is packed.
[3, 5]
[5]
B
the seaside is
seaside
she is by the seaside. The seaside is packed.
[4, 6]
[6]
B
the seaside is
is
she is by the seaside. The seaside is packed.
[1, 7]
[7]
C
nobody knows
nobody
nobody knows what is going on. She can find nobody
[0, 9]
[0]
C
find nobody
nobody
nobody knows what is going on. She can find nobody
[0, 9]
[9]
D
it is such a sunny day
sunny
it is such a sunny day ah I am so happy when it's sunny such a sunny day is the best
[4, 13, 16]
[4]
我在这里发现了一个类似的问题:
但不幸的是,这是在 java 中,我需要使用 python.
的答案
非常感谢!
我会把它分成两步。找出句子中导致该短语的单词数,然后找到该短语中单词的单词索引号:如下所示:
def get_index_of_word_in_sentence(word, phrase, sentence):
index1 = sentence.index(phrase)
word_num1 = len(sentence[:index1].split())
word_num2 = phrase.split().index(word)
return word_num1 + word_num2
df["target"] = df.apply(lambda x: get_index_of_word_in_sentence(x["word"], x["phrase"], x["sentence"]), axis=1)
我需要 sentence
中 word
的索引。但有时会有重复的话。 phrase
信息会很有帮助。或者 word
列中的上一行或下一行。
基本上,我只需要识别话语中的单词,例如如果 word
是 'seaside',我想知道它在句子中是哪个 'seaside'。我有来自 phrase
的额外信息,可以帮助我进行身份验证。它们在数据框中的出现顺序也有帮助。
我现在有的是:
file_id | phrase | word | sentence | word_indices |
---|---|---|---|---|
A | I am | I | I am a happy bird. I sing every day. I eat worms. | [0, 5, 9] |
B | the seaside is | the | she is by the seaside. The seaside is packed. | [3, 5] |
B | the seaside is | seaside | she is by the seaside. The seaside is packed. | [4, 6] |
B | the seaside is | is | she is by the seaside. The seaside is packed. | [1, 7] |
C | nobody knows | nobody | nobody knows what is going on. She can find nobody | [0, 9] |
C | find nobody | nobody | nobody knows what is going on. She can find nobody | [0, 9] |
D | it is such a sunny day | sunny | it is such a sunny day ah I am so happy when it's sunny such a sunny day is the best | [4, 13, 16] |
但我想得到的是target
列中的内容。
file_id | phrase | word | sentence | word_indices | target |
---|---|---|---|---|---|
A | I am | I | I am a happy bird. I sing every day. I eat worms. | [0, 5, 9] | [0] |
B | the seaside is | the | she is by the seaside. The seaside is packed. | [3, 5] | [5] |
B | the seaside is | seaside | she is by the seaside. The seaside is packed. | [4, 6] | [6] |
B | the seaside is | is | she is by the seaside. The seaside is packed. | [1, 7] | [7] |
C | nobody knows | nobody | nobody knows what is going on. She can find nobody | [0, 9] | [0] |
C | find nobody | nobody | nobody knows what is going on. She can find nobody | [0, 9] | [9] |
D | it is such a sunny day | sunny | it is such a sunny day ah I am so happy when it's sunny such a sunny day is the best | [4, 13, 16] | [4] |
我在这里发现了一个类似的问题:
非常感谢!
我会把它分成两步。找出句子中导致该短语的单词数,然后找到该短语中单词的单词索引号:如下所示:
def get_index_of_word_in_sentence(word, phrase, sentence):
index1 = sentence.index(phrase)
word_num1 = len(sentence[:index1].split())
word_num2 = phrase.split().index(word)
return word_num1 + word_num2
df["target"] = df.apply(lambda x: get_index_of_word_in_sentence(x["word"], x["phrase"], x["sentence"]), axis=1)