读取 CSV 时使用 re.findall
Using re.findall when reading CSV
我正在尝试读取 CSV 文件,并使用 re.findall 获取特定部分。
这是我的 CSV 文件前几行的示例
School: Johnson County Elementary School | Student First Name: John | Student Last Name: Doe, 1, Please leave yearbook with sister in office
School: Kirkwood Elementary School | Student First Name: Karen | Student Last Name: Rodgers, 3, Null
School: 2nd Street Elementary School | Student First Name: Joe | Student Last Name: Greene, 12, Give to mom at pickup
这是我使用的代码
import csv
import re
def fileReader():
while True:
input_file = input('What file would you like to read from? (or stop) ')
if input_file.upper() == 'STOP':
break
schools = input('What school would you like to generate reports for? ')
file_contents = open(input_file, newline='', encoding='utf-8')
for row in csv.reader(file_contents):
schoolName = re.findall('(?<=Student First Name: ).+?(?= |)',row[0], re.DOTALL)
print(schoolName)
fileReader()
而当我 运行 这段代码时,输出是学校名称的第一个字符,如下所示:
['J']
['K']
['2']
相反,我想要整个学校的名称:
['Johnson County Elementary School']
['Kirkwood Elementary School']
['2nd Street Elementary School']
我真的很困惑为什么 re.finall 返回第一个字母而不是整个学校名称。
首先,寻找 School
而不是 Student First Name
然后,|
作为 OR 运算符对正则表达式来说是特殊的,必须转义为 \|
才能从字面上找到它:
schoolName = re.findall('(?<=School: ).+?(?= \|)',row[0], re.DOTALL)
您并不真的需要 csv
模块或 lookahead/lookbehind 来查找学校:
import re
with open('input.csv') as file:
for row in file:
schoolName = re.search('School: (.+?) \|',row).group(1)
print(schoolName)
我正在尝试读取 CSV 文件,并使用 re.findall 获取特定部分。
这是我的 CSV 文件前几行的示例
School: Johnson County Elementary School | Student First Name: John | Student Last Name: Doe, 1, Please leave yearbook with sister in office
School: Kirkwood Elementary School | Student First Name: Karen | Student Last Name: Rodgers, 3, Null
School: 2nd Street Elementary School | Student First Name: Joe | Student Last Name: Greene, 12, Give to mom at pickup
这是我使用的代码
import csv
import re
def fileReader():
while True:
input_file = input('What file would you like to read from? (or stop) ')
if input_file.upper() == 'STOP':
break
schools = input('What school would you like to generate reports for? ')
file_contents = open(input_file, newline='', encoding='utf-8')
for row in csv.reader(file_contents):
schoolName = re.findall('(?<=Student First Name: ).+?(?= |)',row[0], re.DOTALL)
print(schoolName)
fileReader()
而当我 运行 这段代码时,输出是学校名称的第一个字符,如下所示:
['J']
['K']
['2']
相反,我想要整个学校的名称:
['Johnson County Elementary School']
['Kirkwood Elementary School']
['2nd Street Elementary School']
我真的很困惑为什么 re.finall 返回第一个字母而不是整个学校名称。
首先,寻找 School
而不是 Student First Name
然后,|
作为 OR 运算符对正则表达式来说是特殊的,必须转义为 \|
才能从字面上找到它:
schoolName = re.findall('(?<=School: ).+?(?= \|)',row[0], re.DOTALL)
您并不真的需要 csv
模块或 lookahead/lookbehind 来查找学校:
import re
with open('input.csv') as file:
for row in file:
schoolName = re.search('School: (.+?) \|',row).group(1)
print(schoolName)