匹配时如何统计索引

How to count the index while matching

目标

我想要用特定符号包围的单词,例如括号及其索引号。

# input and symbol []
A key word is put in parentheses, like these: [keyword] or [key word] 

# output 
keyword (9, 9)
key word (11, 12)

索引号被认为跟在拆分输入句子的列表之后。

问题

目前的输出主要有两个问题

  1. 索引计数不是基于单词的。

  2. 正则表达式匹配没有达到我的预期。

产出

['A', 'key', 'word', 'is', 'put', 'in', 'parentheses,', 'like', 'these:', '[keyword]', 'or', '[key', 'word]']

keyword] or [key word
(47, 68)

代码

import re

sentence = "A key word is put in parentheses, like these: [keyword] or [key word]"
splitted = sentence.split(' ')
matched = re.finditer("(?<=\[).*(?=\])", sentence)
print(matched)
for w in matched:
    print(w.group())
    print(w.span())

如何修复当前代码以提取目标输出?

看看这是否有帮助:

import re

sentence = "A key word is put in parentheses, like these: [keyword] or [key word]"
splitted = sentence.split(' ')
matched = re.finditer("(?<=\[)([a-z ]+)(?=\])", sentence)
#print(matched)
for w in matched:
    start = len(sentence[:w.span()[0]-1].split())
    quantity = len(w.group().split()) - 1
    print(w.group(), (start, start + quantity))

我的输出:

keyword (9, 9)
key word (11, 12)

编辑:

您也可以添加这个

sentence = sentence.replace('[', ' [')
sentence = sentence.replace(']', '] ')

避免使用 split() 和 len() 计算单词位置时可能出现的错误