Python 子串查找
Python substring find
我有一个小请求要问你,我需要有关此代码的帮助:
def grepi(dico, fichier):
line_number = 0
nameFile = os.path.basename(fichier)
# Chargement dico
with open(dico, encoding="utf-8") as dic:
dicolist = dic.read().splitlines()
# Recherche dans fichier
with open(fichier, encoding="utf-8") as fic:
ficlist = fic.read().splitlines()
for line in ficlist:
line_number += 1
for patt in dicolist:
line = line.lower()
if re.search(r' + line + r'\b', patt):
print(line.rstrip() + ', ' + patt + ', ' + nameFile + ', '
+ str(line_number))
我遇到了麻烦:if re.search(r' + line + r'\b', patt):
dico
是名字的字典,例如:
benoît
Nicolas
Stéphane
Sébastien
Alexandre
fichier
是一个包含大量信息的文件,例如:
Is the first name of Nicolas
Is Benoît is here
Hey 1234Alexandre1234
Stéphane found something
dfqklnflSébastiendsqjfldsjfldksj
等..
在我想要的文件中 return 所有确切的字符串(即名字)。但是有些名称的格式如下:1234Alexandre5678 并且我找不到 return 的方法,只是 Alexandre
,dfqklnflSébastiendsqjfldsjfldksj 也是如此,我想 return Sébastien
...
有人可以帮助我吗?
谢谢!
我如何用答案更正我的代码:
#!/usr/bin/env python3
import os
import re
def grepi(dico, fichier):
line_number = 0
nameFile = os.path.basename(fichier)
result_final = []
dicolist = open(dico, encoding="utf-8").read().splitlines()
print(dicolist)
with open(fichier, encoding="utf-8") as ficlist:
ficstring = ficlist.read().splitlines()
for line in ficstring:
ptrn = re.compile(r"\w*(" + "|".join(dicolist) + r")\w*",
flags=re.I)
ptrn_result = ptrn.findall(line)
if ptrn_result:
result_final = (nameFile, line_number, str(ptrn.findall(line)))
print(result_final)
line_number += 1
此处输出:
('prénom.xml', 4, "['Benoit']")
('prénom.xml', 6, "['Stéphane']")
('prénom.xml', 9, "['Alexandre']")
('prénom.xml', 10, "['Nicolas']")
('prénom.xml', 14, "['Sébastien']")
尝试使用模式 '\w*(benoît|Nicolas|Stéphane|Sébastien|Alexandre)\w*'
例如:
import re
dicolist = ['benoît', 'Nicolas', 'Stéphane', 'Sébastien', 'Alexandre']
s = """Is the first name of Nicolas
Is Benoît is here
Hey 1234Alexandre1234
Stéphane found something
dfqklnflSébastiendsqjfldsjfldksj"""
ptrn = re.compile(r"\w*(" + "|".join(dicolist) + r")\w*", flags=re.I)
print(ptrn.findall(s))
输出:
['Nicolas', 'Benoît', 'Alexandre', 'Stéphane', 'Sébastien']
哦!伙计,你的第一个函数 grepi() 需要一些缩进。其余的问题对我来说也很复杂。
我有一个小请求要问你,我需要有关此代码的帮助:
def grepi(dico, fichier):
line_number = 0
nameFile = os.path.basename(fichier)
# Chargement dico
with open(dico, encoding="utf-8") as dic:
dicolist = dic.read().splitlines()
# Recherche dans fichier
with open(fichier, encoding="utf-8") as fic:
ficlist = fic.read().splitlines()
for line in ficlist:
line_number += 1
for patt in dicolist:
line = line.lower()
if re.search(r' + line + r'\b', patt):
print(line.rstrip() + ', ' + patt + ', ' + nameFile + ', '
+ str(line_number))
我遇到了麻烦:if re.search(r' + line + r'\b', patt):
dico
是名字的字典,例如:
benoît
Nicolas
Stéphane
Sébastien
Alexandre
fichier
是一个包含大量信息的文件,例如:
Is the first name of Nicolas
Is Benoît is here
Hey 1234Alexandre1234
Stéphane found something
dfqklnflSébastiendsqjfldsjfldksj
等..
在我想要的文件中 return 所有确切的字符串(即名字)。但是有些名称的格式如下:1234Alexandre5678 并且我找不到 return 的方法,只是 Alexandre
,dfqklnflSébastiendsqjfldsjfldksj 也是如此,我想 return Sébastien
...
有人可以帮助我吗? 谢谢!
我如何用答案更正我的代码:
#!/usr/bin/env python3
import os
import re
def grepi(dico, fichier):
line_number = 0
nameFile = os.path.basename(fichier)
result_final = []
dicolist = open(dico, encoding="utf-8").read().splitlines()
print(dicolist)
with open(fichier, encoding="utf-8") as ficlist:
ficstring = ficlist.read().splitlines()
for line in ficstring:
ptrn = re.compile(r"\w*(" + "|".join(dicolist) + r")\w*",
flags=re.I)
ptrn_result = ptrn.findall(line)
if ptrn_result:
result_final = (nameFile, line_number, str(ptrn.findall(line)))
print(result_final)
line_number += 1
此处输出:
('prénom.xml', 4, "['Benoit']")
('prénom.xml', 6, "['Stéphane']")
('prénom.xml', 9, "['Alexandre']")
('prénom.xml', 10, "['Nicolas']")
('prénom.xml', 14, "['Sébastien']")
尝试使用模式 '\w*(benoît|Nicolas|Stéphane|Sébastien|Alexandre)\w*'
例如:
import re
dicolist = ['benoît', 'Nicolas', 'Stéphane', 'Sébastien', 'Alexandre']
s = """Is the first name of Nicolas
Is Benoît is here
Hey 1234Alexandre1234
Stéphane found something
dfqklnflSébastiendsqjfldsjfldksj"""
ptrn = re.compile(r"\w*(" + "|".join(dicolist) + r")\w*", flags=re.I)
print(ptrn.findall(s))
输出:
['Nicolas', 'Benoît', 'Alexandre', 'Stéphane', 'Sébastien']
哦!伙计,你的第一个函数 grepi() 需要一些缩进。其余的问题对我来说也很复杂。