函数仅打印 5K 文件目录中四个文件的值,Python
Function only prints value for four files from a directory of 5K files, Python
我是 运行 一个代码,它获取 csv 的每一行并在目录的每个文件中找到实体的精确匹配。这里的问题是代码在打印出四个文件的匹配值后终止,而目录中有 5K 个文件。我认为问题出在我的 break or continue 声明上。有人可以帮我解决这个问题吗?到目前为止的代码:
import csv
import os
import re
path = 'C:\Users\Lenovo\.spyder-py3\5KFILES\'
with open('C:\Users\Lenovo\.spyder-py3\codes_file.csv', newline='', encoding ='utf-8') as myFile:
reader = csv.reader(myFile)
for filenames in os.listdir(path):
with open(os.path.join(path, filenames), encoding = 'utf-8') as my:
content = my.read().lower()
#print(content)
for row in reader:
if len(row[1])>=4:
#v = re.search(r'(?<!\w){}(?!\w)'.format(re.escape(row[1])), content, re.I)
v = re.search(r'\b' + re.escape(row[1]) + r'\b', content, re.IGNORECASE)
if v:
print(filenames,v.group(0))
break
reader
在 for
循环之前创建,它是一个迭代器。每次到达 for
行时,迭代都会从它停止的地方继续。一旦到达 reader
的末尾,接下来的 for
循环将为空循环。
您可以在这个简短的示例中看到发生了什么:
l = [0, 1, 2, 3, 4, 5]
iterator = iter(l)
for i in range(0, 16, 2):
print('i:', i, "- starting the 'for j ...' loop")
for j in iterator:
print('iterator:', j)
if j == i:
break
i: 0 - starting the 'for j ...' loop
iterator: 0
i: 2 - starting the 'for j ...' loop
iterator: 1
iterator: 2
i: 4 - starting the 'for j ...' loop
iterator: 3
iterator: 4
i: 6 starting the 'for j ...' loop
iterator: 5
i: 8 starting the 'for j ...' loop
i: 10 starting the 'for j ...' loop
i: 12 starting the 'for j ...' loop
i: 14 starting the 'for j ...' loop
每次 for
循环执行时,它会继续在之前停止的 iterator
上迭代。迭代器用完后,for j...
循环为空。
您应该在每个循环中重新启动它:
for row in csv.reader(myFile):
....
或列一个清单:
reader = list(csv.reader(myFile))
....
for row in reader:
....
我是 运行 一个代码,它获取 csv 的每一行并在目录的每个文件中找到实体的精确匹配。这里的问题是代码在打印出四个文件的匹配值后终止,而目录中有 5K 个文件。我认为问题出在我的 break or continue 声明上。有人可以帮我解决这个问题吗?到目前为止的代码:
import csv
import os
import re
path = 'C:\Users\Lenovo\.spyder-py3\5KFILES\'
with open('C:\Users\Lenovo\.spyder-py3\codes_file.csv', newline='', encoding ='utf-8') as myFile:
reader = csv.reader(myFile)
for filenames in os.listdir(path):
with open(os.path.join(path, filenames), encoding = 'utf-8') as my:
content = my.read().lower()
#print(content)
for row in reader:
if len(row[1])>=4:
#v = re.search(r'(?<!\w){}(?!\w)'.format(re.escape(row[1])), content, re.I)
v = re.search(r'\b' + re.escape(row[1]) + r'\b', content, re.IGNORECASE)
if v:
print(filenames,v.group(0))
break
reader
在 for
循环之前创建,它是一个迭代器。每次到达 for
行时,迭代都会从它停止的地方继续。一旦到达 reader
的末尾,接下来的 for
循环将为空循环。
您可以在这个简短的示例中看到发生了什么:
l = [0, 1, 2, 3, 4, 5]
iterator = iter(l)
for i in range(0, 16, 2):
print('i:', i, "- starting the 'for j ...' loop")
for j in iterator:
print('iterator:', j)
if j == i:
break
i: 0 - starting the 'for j ...' loop
iterator: 0
i: 2 - starting the 'for j ...' loop
iterator: 1
iterator: 2
i: 4 - starting the 'for j ...' loop
iterator: 3
iterator: 4
i: 6 starting the 'for j ...' loop
iterator: 5
i: 8 starting the 'for j ...' loop
i: 10 starting the 'for j ...' loop
i: 12 starting the 'for j ...' loop
i: 14 starting the 'for j ...' loop
每次 for
循环执行时,它会继续在之前停止的 iterator
上迭代。迭代器用完后,for j...
循环为空。
您应该在每个循环中重新启动它:
for row in csv.reader(myFile):
....
或列一个清单:
reader = list(csv.reader(myFile))
....
for row in reader:
....