Python 子串查找

Question

我有一个小请求要问你，我需要有关此代码的帮助：

def grepi(dico, fichier):
    line_number = 0
    nameFile = os.path.basename(fichier)
    # Chargement dico
    with open(dico, encoding="utf-8") as dic:
        dicolist = dic.read().splitlines()


    # Recherche dans fichier
    with open(fichier, encoding="utf-8") as fic:
        ficlist = fic.read().splitlines()

    for line in ficlist:
        line_number += 1
        for patt in dicolist:
            line = line.lower()
            if re.search(r' + line + r'\b', patt):
                print(line.rstrip() + ', ' + patt + ', ' + nameFile + ', '
                      + str(line_number))

我遇到了麻烦：if re.search(r' + line + r'\b', patt):

dico 是名字的字典，例如：

benoît
Nicolas
Stéphane
Sébastien
Alexandre

fichier 是一个包含大量信息的文件，例如：

Is the first name of Nicolas
Is Benoît is here
Hey 1234Alexandre1234
   Stéphane found something
dfqklnflSébastiendsqjfldsjfldksj

等..

在我想要的文件中 return 所有确切的字符串（即名字）。但是有些名称的格式如下：1234Alexandre5678 并且我找不到 return 的方法，只是 Alexandre，dfqklnflSébastiendsqjfldsjfldksj 也是如此，我想 return Sébastien ...

有人可以帮助我吗？谢谢！

我如何用答案更正我的代码：

#!/usr/bin/env python3
import os
import re


def grepi(dico, fichier):
    line_number = 0
    nameFile = os.path.basename(fichier)
    result_final = []

    dicolist = open(dico, encoding="utf-8").read().splitlines()
    print(dicolist)

    with open(fichier, encoding="utf-8") as ficlist:
        ficstring = ficlist.read().splitlines()
        for line in ficstring:
            ptrn = re.compile(r"\w*(" + "|".join(dicolist) + r")\w*",
                              flags=re.I)
            ptrn_result = ptrn.findall(line)
            if ptrn_result:
                result_final = (nameFile, line_number, str(ptrn.findall(line)))
                print(result_final)
            line_number += 1

此处输出：

('prénom.xml', 4, "['Benoit']")
('prénom.xml', 6, "['Stéphane']")
('prénom.xml', 9, "['Alexandre']")
('prénom.xml', 10, "['Nicolas']")
('prénom.xml', 14, "['Sébastien']")

Answer 1

尝试使用模式 '\w*(benoît|Nicolas|Stéphane|Sébastien|Alexandre)\w*'

例如：

import re

dicolist = ['benoît', 'Nicolas', 'Stéphane', 'Sébastien', 'Alexandre']
s = """Is the first name of Nicolas
Is Benoît is here
Hey 1234Alexandre1234
   Stéphane found something
dfqklnflSébastiendsqjfldsjfldksj"""

ptrn = re.compile(r"\w*(" + "|".join(dicolist) + r")\w*", flags=re.I)
print(ptrn.findall(s))

输出：

['Nicolas', 'Benoît', 'Alexandre', 'Stéphane', 'Sébastien']

Answer 2

哦！伙计，你的第一个函数 grepi() 需要一些缩进。其余的问题对我来说也很复杂。

Python 子串查找

Python substring find

search

find

python-3.x

python-re