根据其位置获取字符串中字符的单词

Question

我有一个字符串，例如：

"This is my very boring string"

此外，我在字符串中有一个字符的位置没有空格。

例如：

位置 13，在此示例中匹配单词 boring 中的 o。

我需要的是，根据我得到的索引（13）到return这个词（boring）。

此代码将 return 字符 (o):

re.findall('[a-z]',s)[13]

但出于某种原因，我想不出一个很好的方法来表达 return 这个词无聊。

如有任何帮助，我们将不胜感激。

Answer 1

您可以使用正则表达式\w+来匹配单词并不断累积匹配的长度，直到总长度超过目标位置：

def get_word_at(string, position):
    length = 0
    for word in re.findall(r'\w+', string):
        length += len(word)
        if length > position:
            return word

这样 get_word_at('This is my very boring string', 13) 就会 return:

boring

Answer 2

此函数将接受两个参数：一个字符串和一个索引。

它将索引转换为与原始字符串相同的索引。

然后，将return转换索引的字符所属的单词

def find(string,idx):
    # Find the index of the character relative original string
    i1 = idx
    for char in string:
        if char == ' ':
            i1 += 1
        if string[i1] == string.replace(' ','')[idx]:
            break

    # Find which word the index belongs to in the original string
    i2 = 0
    for word in string.split():
        for l in word:
            i2 += 1
            if i2 == i1:
                return(word)
        i2+=1

print(find("This is my very boring string", 13))

输出：

boring

Answer 3

您可以安装和使用 regex 模块，它支持具有可变长度后视的模式，这样您就可以使用这样的模式来断言恰好有所需数量的单词字符，可以选择包围通过空格，在匹配词后面：

import regex
regex.search(r'\w*(?<=^\s*(\w\s*){13})\w+', 'This is my very boring string').group()

这个returns:

boring

Answer 4

不需要缓慢且丑陋的 var length lookbehind。
使用带有捕获组的简单前瞻将得到这个词。

此正则表达式使用非空格作为字符。

^(?:\s*(?=(?<!\S)(\S+))?\S){13}

demo 13th char

必要时使用单词，但无论字符寻找什么都必须
与反字符一起使用，否则什么都不会起作用，
它会停止，因为 ALL 个字符必须匹配。

示例：

\w 与 \W
\s 与 \S

demo 1st char

demo 18th char

Answer 5

一个非正则表达式的解决方案，力求实现 OP 所期望的优雅：

def word_out_of_string(string, character_index):
    words = string.split()

    while words and character_index >= len(words[0]):
        character_index -= len(words.pop(0))

    return words.pop(0) if words else None

print(word_out_of_string("This is my very boring string", 13))

Answer 6

如果使用 Python 的替代正则表达式引擎，可以用空字符串替换以下正则表达式的匹配项：

r'^(?:\s*\S){0,13}\s|(?<=(?:\s*\S){13,})\s.*'

Regex demo _{^<¯\_(ツ)_/¯^>} Python demo

对于示例字符串，'boring' 中的 'o' 在删除空格后位于索引 13 处。如果正则表达式中的两个 13 都更改为 12-17 范围内的任何数字，则返回 'boring'。如果改为12，则返回'very'；如果将它们更改为 18，则返回 `'string'。

正则表达式引擎执行以下操作。

^            : match beginning of string
(?:\s*\S)    : match 0+ ws chars, then 1 non-ws char, in a non-capture group
{0,13}       : execute the non-capture group 0-13 times 
\s           : match a ws char
|            : or
(?<=         : begin a positive lookbehind
  (?:\s*\S)  : match 0+ ws chars, then 1 non-ws char, in a non-capture group 
  {13,}      : execute the non-capture group at least 13 times
)            : end positive lookahead
\s           : match 1 ws char
.*           : match 0+ chars

根据其位置获取字符串中字符的单词

Getting word of a char in string based on its location

python

regex

string

python-re