匹配时如何统计索引
How to count the index while matching
目标
我想要用特定符号包围的单词,例如括号及其索引号。
# input and symbol []
A key word is put in parentheses, like these: [keyword] or [key word]
# output
keyword (9, 9)
key word (11, 12)
索引号被认为跟在拆分输入句子的列表之后。
问题
目前的输出主要有两个问题
索引计数不是基于单词的。
正则表达式匹配没有达到我的预期。
产出
['A', 'key', 'word', 'is', 'put', 'in', 'parentheses,', 'like', 'these:', '[keyword]', 'or', '[key', 'word]']
keyword] or [key word
(47, 68)
代码
import re
sentence = "A key word is put in parentheses, like these: [keyword] or [key word]"
splitted = sentence.split(' ')
matched = re.finditer("(?<=\[).*(?=\])", sentence)
print(matched)
for w in matched:
print(w.group())
print(w.span())
如何修复当前代码以提取目标输出?
看看这是否有帮助:
import re
sentence = "A key word is put in parentheses, like these: [keyword] or [key word]"
splitted = sentence.split(' ')
matched = re.finditer("(?<=\[)([a-z ]+)(?=\])", sentence)
#print(matched)
for w in matched:
start = len(sentence[:w.span()[0]-1].split())
quantity = len(w.group().split()) - 1
print(w.group(), (start, start + quantity))
我的输出:
keyword (9, 9)
key word (11, 12)
编辑:
您也可以添加这个
sentence = sentence.replace('[', ' [')
sentence = sentence.replace(']', '] ')
避免使用 split() 和 len() 计算单词位置时可能出现的错误
目标
我想要用特定符号包围的单词,例如括号及其索引号。
# input and symbol []
A key word is put in parentheses, like these: [keyword] or [key word]
# output
keyword (9, 9)
key word (11, 12)
索引号被认为跟在拆分输入句子的列表之后。
问题
目前的输出主要有两个问题
索引计数不是基于单词的。
正则表达式匹配没有达到我的预期。
产出
['A', 'key', 'word', 'is', 'put', 'in', 'parentheses,', 'like', 'these:', '[keyword]', 'or', '[key', 'word]']
keyword] or [key word
(47, 68)
代码
import re
sentence = "A key word is put in parentheses, like these: [keyword] or [key word]"
splitted = sentence.split(' ')
matched = re.finditer("(?<=\[).*(?=\])", sentence)
print(matched)
for w in matched:
print(w.group())
print(w.span())
如何修复当前代码以提取目标输出?
看看这是否有帮助:
import re
sentence = "A key word is put in parentheses, like these: [keyword] or [key word]"
splitted = sentence.split(' ')
matched = re.finditer("(?<=\[)([a-z ]+)(?=\])", sentence)
#print(matched)
for w in matched:
start = len(sentence[:w.span()[0]-1].split())
quantity = len(w.group().split()) - 1
print(w.group(), (start, start + quantity))
我的输出:
keyword (9, 9)
key word (11, 12)
编辑:
您也可以添加这个
sentence = sentence.replace('[', ' [')
sentence = sentence.replace(']', '] ')
避免使用 split() 和 len() 计算单词位置时可能出现的错误