获取字符串中数字的索引并提取数字前后的单词(不同语言)
Get the indices of numbers in a string and extract words before and after the number (in different languages)
我尝试使用正则表达式并找到了数字,但没有找到整个数字的索引,而是只为数字中的第一个字符获取索引
text = "४०० pounds of wheat at $ 3 per pound"
numero = re.finditer(r"(\d+)", text) ####
op = re.findall(r"(\d+)", text) ####
indices = [m.start() for m in numero]
OUTPUT
[0, 25]
***Expected OUTPUT***
[0, 6]
找到准确的索引并存储在列表中后,提取单词会更容易。这就是我所相信的?你怎么看?
此外,我希望在不同位置使用单词,因此它不能是静态方法
您可以将其标记化并以此方式构建您的逻辑。试试这个:
number_index = []
text = "४०० pounds of wheat at $ 3 per pound"
text_list = text.split(" ")
# Find which words are integers.
for index, word in enumerate(text_list):
try:
int(word)
number_index.append(index)
except:
pass
# Now perform operations on those integers
for i in number_index:
word = text_list[i]
# do operations and put it back in the list
# Re-build string afterwards
您用 nlp tag and it is a python question, why don't you use Spacy
?
标记了问题
查看 Spacy 3.0.1 的 Python 演示:
import spacy
nlp = spacy.load("en_core_web_trf")
text = "४०० pounds of wheat at $ 3 per pound"
doc = nlp(text)
print([(token.text, token.i) for token in doc if token.is_alpha])
## => [('pounds', 1), ('of', 2), ('wheat', 3), ('at', 4), ('per', 7), ('pound', 8)]
## => print([(token.text, token.i) for token in doc if token.like_num])
[('४००', 0), ('3', 6)]
这里,
nlp
对象初始化为英文“big”模型
doc
是用您的 text
变量初始化的 Spacy 文档
[(token.text, token.i) for token in doc if token.is_alpha]
为您提供字母单词列表及其值 (token.text
) 及其在文档中的位置 (token.i
)
[(token.text, token.i) for token in doc if token.like_num]
获取数字列表及其在文档中的位置。
我尝试使用正则表达式并找到了数字,但没有找到整个数字的索引,而是只为数字中的第一个字符获取索引
text = "४०० pounds of wheat at $ 3 per pound"
numero = re.finditer(r"(\d+)", text) ####
op = re.findall(r"(\d+)", text) ####
indices = [m.start() for m in numero]
OUTPUT
[0, 25]
***Expected OUTPUT***
[0, 6]
找到准确的索引并存储在列表中后,提取单词会更容易。这就是我所相信的?你怎么看?
此外,我希望在不同位置使用单词,因此它不能是静态方法
您可以将其标记化并以此方式构建您的逻辑。试试这个:
number_index = []
text = "४०० pounds of wheat at $ 3 per pound"
text_list = text.split(" ")
# Find which words are integers.
for index, word in enumerate(text_list):
try:
int(word)
number_index.append(index)
except:
pass
# Now perform operations on those integers
for i in number_index:
word = text_list[i]
# do operations and put it back in the list
# Re-build string afterwards
您用 nlp tag and it is a python question, why don't you use Spacy
?
查看 Spacy 3.0.1 的 Python 演示:
import spacy
nlp = spacy.load("en_core_web_trf")
text = "४०० pounds of wheat at $ 3 per pound"
doc = nlp(text)
print([(token.text, token.i) for token in doc if token.is_alpha])
## => [('pounds', 1), ('of', 2), ('wheat', 3), ('at', 4), ('per', 7), ('pound', 8)]
## => print([(token.text, token.i) for token in doc if token.like_num])
[('४००', 0), ('3', 6)]
这里,
nlp
对象初始化为英文“big”模型doc
是用您的text
变量初始化的 Spacy 文档[(token.text, token.i) for token in doc if token.is_alpha]
为您提供字母单词列表及其值 (token.text
) 及其在文档中的位置 (token.i
)[(token.text, token.i) for token in doc if token.like_num]
获取数字列表及其在文档中的位置。