获取多行字符串中单词 'print' 的索引

Getting the index of the word 'print' in a multiline string

我试图在多行文本中找到所有单词的索引:'print'。但是也有一些问题,它们是:

  1. 代码returns如果一行中有两次打印,则单词'print'的索引相同。
  2. 无法在同一行中找到第二个'print'的索引,而是将第一个'print'的索引打印两次。 我的代码是:
text = '''print is print as
it is the function an
print is print and not print
'''

text_list = []

for line in text.splitlines():

    #'line' represents each line in the multiline string
    text_list.append([])

    for letter in line:
        #Append the letter of each line in a list inside the the text_list
        text_list[len(text_list)-1].append(letter)

for line in text_list:
    for letter in line:

        #check if the letter is after 'p' is 'r' and after that 'i' and then 'n' and at last 't'
        if letter == "p":
            num = 1

            if text_list[text_list.index(line)][line.index(letter)+num] == 'r':
                num += 1
                
                if text_list[text_list.index(line)][line.index(letter)+num] == 'i':
                    num += 1

                    if text_list[text_list.index(line)][line.index(letter)+num] == 'n':
                        num += 1

                        if text_list[text_list.index(line)][line.index(letter)+num] == 't':
                            num += 1
                            print(f'index (start,end) = {text_list.index(line)}.{line.index(letter)}, {text_list.index(line)}.{line.index(letter)+num}')
                        

当我 运行 它打印:

index (start,end) = 0.0, 0.5 #returns the index of the first 'print' in first line
index (start,end) = 0.0, 0.5 #returns the index of the first 'print' in first line instead of the index of the second print
index (start,end) = 2.0, 2.5 #returns the index of the first 'print' in third line
index (start,end) = 2.0, 2.5 #returns the index of the first 'print' in third line instead of the index of the second print
index (start,end) = 2.0, 2.5 #returns the index of the first 'print' in third line instead of the index of the third print

你可以看到在结果中,索引是重复的。这是 text_list:

>>> text_list
[['p', 'r', 'i', 'n', 't', ' ', 'i', 's', ' ', 'p', 'r', 'i', 'n', 't', ' ', 'a', 's'],
['i', 't', ' ', 'i', 's', ' ', 't', 'h', 'e', ' ', 'f', 'u', 'n', 'c', 't', 'i', 'o', 'n', ' ', 'a', 'n'],
['p', 'r', 'i', 'n', 't', ' ', 'i', 's', ' ', 'p', 'r', 'i', 'n', 't', ' ', 'a', 'n', 'd', ' ', 'n', 'o', 't', ' ', 'p', 'r', 'i', 'n', 't']]
>>>

text_list中的每个listtext中的一行.一共有三行,所以text_list里面有3个list。如何获取第一行第二个 'print' 的索引和第三行第二个和第三个 'print' 的索引?可以看到returns只是第一行和第三行第一个'print'的索引

strings already have an index 方法查找子字符串,你可以给额外的参数来查找给定子字符串的下一个副本的下一个副本

>>> text = '''print is print as
it is the function an
print is print and not print
'''
>>> text.index("print")
0
>>> text.index("print",1)
9
>>> text.index("print",10)
40
>>> text.index("print",41)
49
>>> text.index("print",50)
63
>>> text.index("print",64)
Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    text.index("print",64)
ValueError: substring not found
>>> 

您可以使用正则表达式:

import re

text = '''print is print as
it is the function an
print is print and not print
'''

for i in re.finditer("print", text):
    print(i.start())

# OR AS A LIST

[i.start() for i in re.finditer("print", text)]
import re

text = '''print is print as
it is the function an
print is print and not print
'''

for line_number, line in enumerate(text.split('\n')):
    occurrences = [m.start() for m in re.finditer('print', line)]

    if occurrences:
        for occurrence in occurrences:
            print('Found `print` at character %d on line %d' % (occurrence, line_number + 1))

->

Found `print` at character 0 on line 1
Found `print` at character 9 on line 1
Found `print` at character 0 on line 3
Found `print` at character 9 on line 3
Found `print` at character 23 on line 3

你最初走在正确的轨道上。您将文本分成几行。下一步是使用 split() 方法将每一行拆分为单词,而不是字母。然后,您可以轻松获取每一行中每个 'print' 字符串的索引。

以下代码将所需索引打印为列表列表,每个内部列表对应一个单独的行:

text = '''print is print as
it is the function an
print is print and not print
'''

index_list = []
for line in text.splitlines():
    index_list.append([])
    for idx, word in enumerate(line.split()):
        if word == 'print':
            index_list[-1].append(idx)

print(index_list)

#[[0, 2], [], [0, 2, 5]]