Python 子串查找

Python substring find

我有一个小请求要问你,我需要有关此代码的帮助:

def grepi(dico, fichier):
    line_number = 0
    nameFile = os.path.basename(fichier)
    # Chargement dico
    with open(dico, encoding="utf-8") as dic:
        dicolist = dic.read().splitlines()


    # Recherche dans fichier
    with open(fichier, encoding="utf-8") as fic:
        ficlist = fic.read().splitlines()

    for line in ficlist:
        line_number += 1
        for patt in dicolist:
            line = line.lower()
            if re.search(r' + line + r'\b', patt):
                print(line.rstrip() + ', ' + patt + ', ' + nameFile + ', '
                      + str(line_number))

我遇到了麻烦:if re.search(r' + line + r'\b', patt):

dico 是名字的字典,例如:

benoît
Nicolas
Stéphane
Sébastien
Alexandre

fichier 是一个包含大量信息的文件,例如:

Is the first name of Nicolas
Is Benoît is here
Hey 1234Alexandre1234
   Stéphane found something
dfqklnflSébastiendsqjfldsjfldksj

等..

在我想要的文件中 return 所有确切的字符串(即名字)。但是有些名称的格式如下:1234Alexandre5678 并且我找不到 return 的方法,只是 Alexandre,dfqklnflSébastiendsqjfldsjfldksj 也是如此,我想 return Sébastien ...

有人可以帮助我吗? 谢谢!

我如何用答案更正我的代码:

#!/usr/bin/env python3
import os
import re


def grepi(dico, fichier):
    line_number = 0
    nameFile = os.path.basename(fichier)
    result_final = []

    dicolist = open(dico, encoding="utf-8").read().splitlines()
    print(dicolist)

    with open(fichier, encoding="utf-8") as ficlist:
        ficstring = ficlist.read().splitlines()
        for line in ficstring:
            ptrn = re.compile(r"\w*(" + "|".join(dicolist) + r")\w*",
                              flags=re.I)
            ptrn_result = ptrn.findall(line)
            if ptrn_result:
                result_final = (nameFile, line_number, str(ptrn.findall(line)))
                print(result_final)
            line_number += 1

此处输出:

('prénom.xml', 4, "['Benoit']")
('prénom.xml', 6, "['Stéphane']")
('prénom.xml', 9, "['Alexandre']")
('prénom.xml', 10, "['Nicolas']")
('prénom.xml', 14, "['Sébastien']")

尝试使用模式 '\w*(benoît|Nicolas|Stéphane|Sébastien|Alexandre)\w*'

例如:

import re

dicolist = ['benoît', 'Nicolas', 'Stéphane', 'Sébastien', 'Alexandre']
s = """Is the first name of Nicolas
Is Benoît is here
Hey 1234Alexandre1234
   Stéphane found something
dfqklnflSébastiendsqjfldsjfldksj"""

ptrn = re.compile(r"\w*(" + "|".join(dicolist) + r")\w*", flags=re.I)
print(ptrn.findall(s))

输出:

['Nicolas', 'Benoît', 'Alexandre', 'Stéphane', 'Sébastien']

哦!伙计,你的第一个函数 grepi() 需要一些缩进。其余的问题对我来说也很复杂。