按图形的名称对从文件夹中提取的文件进行排序
Sorting files drawn from a folder by name for a graph
我有一个代码来计算文件夹中每个文件的字符串(每个文件是一年中的一个月,即 2012 04、2006 11 等)并将它们相加:
mypath = "C:\Users\Desktop\FILE\"
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath,f))]
result = {}
for f in onlyfiles: #imports all files in CSV folder as 'onlyfiles'
with open(mypath + f, 'r') as content_file:
content = content_file.read()
a1 = content.count('Bacon')
a2 = content.count('Eggs')
total = a1 + a2
result[f.strip(".csv")] = total
然后将值放入字典中:
new_dictionary = {}
count = 0
for m, n in result.items():
print 'The number of bacon and eggs in', m, "was", n
count += 1
new_dictionary['month_{}'.format(count)] = result
最后将它们绘制在图表上:
plt.plot(result.values())
plt.ylabel('Bacon and eggs seen in this month')
plt.xlabel('Time')
plt.title('Amount of times bacon and eggs seen over time')
plt.xticks(range(len(result)), result.keys())
plt.show()
然而,当它打印图形时,时间(月份等)是随机排列的,而不是它们随时间推移的顺序,因为它们在文件夹中是这样的:
如何让图表按逻辑顺序绘制它们?
我试过使用 list.sorted 方法,但它最终打印出奇怪的东西。
注意:数据是编造的,因为真实数据是敏感的,但原理是一样的。
填写 new_dictionary
时,您应按顺序提供值:
for m, n in sorted(result.items()):
您可能想看看 https://docs.python.org/2/library/os.path.html,因为这可能对您有所帮助。
您可以利用 "os.path.split()" 拆分文件路径,这样您就有了一个列表:
['root path','file.csv']
然后你可以使用 os.path.splitext() 到 return 另一个列表:
['file','csv']
如果你有:2015-03.csv,你可以:
filename = os.path.splitext(os.path.split(f)[1])[0]
# get list item 1 from os.path.split() and use that
# in os.path.splitext() and grab the first list item
然后您可以将其添加到您的词典或使用嵌套词典,例如:
mypath = "C:\Users\Desktop\FILE\"
result = {}
for f in [f for f in os.path.listdir(mypath) if os.path.isfile(f)]:
with open(os.path.abspath(f), "r") as content_file:
content = content_file.read()
a1 = content.count('Bacon')
a2 = content.count('Eggs')
total = a1 + a2
result[os.path.splitext(os.path.split(f)[1])[0]] = {"Bacon":a1,"Eggs":a2,"Total":total}
for filename in sorted(result.iterkeys()):
print("File: {0}; Bacon: {1}; Eggs: {2}; Total: {3}").format(filename,result[filename]["Bacon"],result[filename]["Eggs"],result[filename]["Total"])
你考虑过正则表达式吗? re.findall() returns 结果列表:
bacon = re.findall(re.compile(r"bacon",re.MULTILINE),content)
eggs = re.findall(re.compile(r"eggs",re.MULTILINE),content)
print(str("Total bacon: {0}").format(len(bacon)))
print(str("Total eggs: {0}").format(len(eggs)))
如果您正在处理一个大文件,那么您可能需要考虑使用 mmap 将整个内容读入内存。请查看 https://docs.python.org/2/library/re.html 了解更多信息。
我有一个代码来计算文件夹中每个文件的字符串(每个文件是一年中的一个月,即 2012 04、2006 11 等)并将它们相加:
mypath = "C:\Users\Desktop\FILE\"
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath,f))]
result = {}
for f in onlyfiles: #imports all files in CSV folder as 'onlyfiles'
with open(mypath + f, 'r') as content_file:
content = content_file.read()
a1 = content.count('Bacon')
a2 = content.count('Eggs')
total = a1 + a2
result[f.strip(".csv")] = total
然后将值放入字典中:
new_dictionary = {}
count = 0
for m, n in result.items():
print 'The number of bacon and eggs in', m, "was", n
count += 1
new_dictionary['month_{}'.format(count)] = result
最后将它们绘制在图表上:
plt.plot(result.values())
plt.ylabel('Bacon and eggs seen in this month')
plt.xlabel('Time')
plt.title('Amount of times bacon and eggs seen over time')
plt.xticks(range(len(result)), result.keys())
plt.show()
然而,当它打印图形时,时间(月份等)是随机排列的,而不是它们随时间推移的顺序,因为它们在文件夹中是这样的:
如何让图表按逻辑顺序绘制它们?
我试过使用 list.sorted 方法,但它最终打印出奇怪的东西。
注意:数据是编造的,因为真实数据是敏感的,但原理是一样的。
填写 new_dictionary
时,您应按顺序提供值:
for m, n in sorted(result.items()):
您可能想看看 https://docs.python.org/2/library/os.path.html,因为这可能对您有所帮助。
您可以利用 "os.path.split()" 拆分文件路径,这样您就有了一个列表:
['root path','file.csv']
然后你可以使用 os.path.splitext() 到 return 另一个列表:
['file','csv']
如果你有:2015-03.csv,你可以:
filename = os.path.splitext(os.path.split(f)[1])[0]
# get list item 1 from os.path.split() and use that
# in os.path.splitext() and grab the first list item
然后您可以将其添加到您的词典或使用嵌套词典,例如:
mypath = "C:\Users\Desktop\FILE\"
result = {}
for f in [f for f in os.path.listdir(mypath) if os.path.isfile(f)]:
with open(os.path.abspath(f), "r") as content_file:
content = content_file.read()
a1 = content.count('Bacon')
a2 = content.count('Eggs')
total = a1 + a2
result[os.path.splitext(os.path.split(f)[1])[0]] = {"Bacon":a1,"Eggs":a2,"Total":total}
for filename in sorted(result.iterkeys()):
print("File: {0}; Bacon: {1}; Eggs: {2}; Total: {3}").format(filename,result[filename]["Bacon"],result[filename]["Eggs"],result[filename]["Total"])
你考虑过正则表达式吗? re.findall() returns 结果列表:
bacon = re.findall(re.compile(r"bacon",re.MULTILINE),content)
eggs = re.findall(re.compile(r"eggs",re.MULTILINE),content)
print(str("Total bacon: {0}").format(len(bacon)))
print(str("Total eggs: {0}").format(len(eggs)))
如果您正在处理一个大文件,那么您可能需要考虑使用 mmap 将整个内容读入内存。请查看 https://docs.python.org/2/library/re.html 了解更多信息。