排序和匹配 Python 列表
Sorting and Matching a Python list
我最近问了一个类似的问题,但需要更深入一点。
本质上,我正在读取文件目录并将所有内容附加到名为 filelistname
的列表中
我正在尝试按 diskcount (-#disk-) 对这个列表进行排序,运行 一个针对该排序列表的函数。
感谢您的帮助。
这是一个例子-
In []: filelistname
Out []: ['C:\Test3\ARRAY05-2NODE-RAID1-12disk-128k-0-segmented.xlsx'
'C:\Test1\ARRAY05-2NODE-RAID1-17disk-128k-0-segmented.xlsx',
'C:\Test4\ARRAY05-2NODE-RAID1-25disk-128k-0-segmented.xlsx',
'C:\Test2\ARRAY05-2NODE-RAID1-18disk-128k-0-segmented.xlsx',
'C:\Test1\ARRAY05-2NODE-RAID1-12disk-32k-0-segmented.xlsx',
'C:\Test6\ARRAY05-2NODE-RAID1-25disk-32k-0-segmented.xlsx',
'C:\Test2\ARRAY05-2NODE-RAID1-12disk-64k-0-segmented.xlsx',
'C:\Test5\ARRAY05-2NODE-RAID1-12disk-64k-100-segmented.xlsx']
它的输出看起来像这样。
一组
C:\Test3\ARRAY05-2NODE-RAID1-12disk-128k-0-segmented.xlsx
C:\Test1\ARRAY05-2NODE-RAID1-17disk-128k-0-segmented.xlsx
C:\Test2\ARRAY05-2NODE-RAID1-18disk-128k-0-segmented.xlsx
另一个组
C:\Test4\ARRAY05-4NODE-RAID1-25disk-128k-0-segmented.xlsx
另一组
C:\Test1\ARRAY05-2NODE-RAID1-12disk-32k-0-segmented.xlsx
C:\Test6\ARRAY05-2NODE-RAID1-25disk-32k-0-segmented.xlsx
另一组
C:\Test2\ARRAY05-2NODE-RAID1-12disk-64k-0-segmented.xlsx
另一组
C:\Test5\ARRAY05-2NODE-RAID1-12disk-64k-100-segmented.xlsx
我目前正在玩这个,但无法识别正确的密钥。
import os
from itertools import groupby
from collections import defaultdict
key_fn = lambda s: s.rsplit('-',4)[0]
filelistname = sorted(filelistname, key=key_fn)
print(key)
for key, grouped_file_names in groupby(filelistname, key=key_fn):
print('\n'.join(list(grouped_file_names)))
print("")
您似乎按 d+k-d+
分组,因此拆分基本名称并将其用作键:
from collections import defaultdict
d = defaultdict(list)
for sub in l:
spl = sub.rsplit("-", 3)
k = spl[-3],spl[-2]
d[k].append(sub)
输出:
from pprint import pprint as pp
pp(d)
{ ('128k', '0'): [ 'C:\Test3\ARRAY05-2NODE-RAID1-12disk-128k-0-segmented.xlsxC:\Test1\ARRAY05-2NODE-RAID1-17disk-128k-0-segmented.xlsx',
'C:\Test4\ARRAY05-2NODE-RAID1-25disk-128k-0-segmented.xlsx',
'C:\Test2\ARRAY05-2NODE-RAID1-18disk-128k-0-segmented.xlsx'],
('32k', '0'): [ 'C:\Test1\ARRAY05-2NODE-RAID1-12disk-32k-0-segmented.xlsx',
'C:\Test6\ARRAY05-2NODE-RAID1-25disk-32k-0-segmented.xlsx'],
('64k', '0'): ['C:\Test2\ARRAY05-2NODE-RAID1-12disk-64k-0-segmented.xlsx'],
('64k', '100'): [ 'C:\Test5\ARRAY05-2NODE-RAID1-12disk-64k-100-segmented.xlsx']}
如果你想要除磁盘部分以外的所有部分:
from collections import defaultdict
from os import path
from ntpath import basename
d = defaultdict(list)
for sub in l:
spl = basename(sub).rsplit("-", 5)
k = spl[0]+"-" + "-".join(spl[3:5])
d[k].append(sub)
输出:
{'ARRAY05-2NODE-128k-0': ['C:\Test3\ARRAY05-2NODE-RAID1-12disk-128k-0-segmented.xlsx',
'C:\Test1\ARRAY05-2NODE-RAID1-17disk-128k-0-segmented.xlsx',
'C:\Test4\ARRAY05-2NODE-RAID1-25disk-128k-0-segmented.xlsx',
'C:\Test2\ARRAY05-2NODE-RAID1-18disk-128k-0-segmented.xlsx'],
'ARRAY05-2NODE-32k-0': ['C:\Test1\ARRAY05-2NODE-RAID1-12disk-32k-0-segmented.xlsx',
'C:\Test6\ARRAY05-2NODE-RAID1-25disk-32k-0-segmented.xlsx'],
'ARRAY05-2NODE-64k-0': ['C:\Test2\ARRAY05-2NODE-RAID1-12disk-64k-0-segmented.xlsx'],
'ARRAY05-2NODE-64k-100': ['C:\Test5\ARRAY05-2NODE-RAID1-12disk-64k-100-segmented.xlsx']}
我最近问了一个类似的问题,但需要更深入一点。
本质上,我正在读取文件目录并将所有内容附加到名为 filelistname
的列表中我正在尝试按 diskcount (-#disk-) 对这个列表进行排序,运行 一个针对该排序列表的函数。
感谢您的帮助。
这是一个例子-
In []: filelistname
Out []: ['C:\Test3\ARRAY05-2NODE-RAID1-12disk-128k-0-segmented.xlsx'
'C:\Test1\ARRAY05-2NODE-RAID1-17disk-128k-0-segmented.xlsx',
'C:\Test4\ARRAY05-2NODE-RAID1-25disk-128k-0-segmented.xlsx',
'C:\Test2\ARRAY05-2NODE-RAID1-18disk-128k-0-segmented.xlsx',
'C:\Test1\ARRAY05-2NODE-RAID1-12disk-32k-0-segmented.xlsx',
'C:\Test6\ARRAY05-2NODE-RAID1-25disk-32k-0-segmented.xlsx',
'C:\Test2\ARRAY05-2NODE-RAID1-12disk-64k-0-segmented.xlsx',
'C:\Test5\ARRAY05-2NODE-RAID1-12disk-64k-100-segmented.xlsx']
它的输出看起来像这样。
一组
C:\Test3\ARRAY05-2NODE-RAID1-12disk-128k-0-segmented.xlsx
C:\Test1\ARRAY05-2NODE-RAID1-17disk-128k-0-segmented.xlsx
C:\Test2\ARRAY05-2NODE-RAID1-18disk-128k-0-segmented.xlsx
另一个组
C:\Test4\ARRAY05-4NODE-RAID1-25disk-128k-0-segmented.xlsx
另一组
C:\Test1\ARRAY05-2NODE-RAID1-12disk-32k-0-segmented.xlsx
C:\Test6\ARRAY05-2NODE-RAID1-25disk-32k-0-segmented.xlsx
另一组
C:\Test2\ARRAY05-2NODE-RAID1-12disk-64k-0-segmented.xlsx
另一组
C:\Test5\ARRAY05-2NODE-RAID1-12disk-64k-100-segmented.xlsx
我目前正在玩这个,但无法识别正确的密钥。
import os
from itertools import groupby
from collections import defaultdict
key_fn = lambda s: s.rsplit('-',4)[0]
filelistname = sorted(filelistname, key=key_fn)
print(key)
for key, grouped_file_names in groupby(filelistname, key=key_fn):
print('\n'.join(list(grouped_file_names)))
print("")
您似乎按 d+k-d+
分组,因此拆分基本名称并将其用作键:
from collections import defaultdict
d = defaultdict(list)
for sub in l:
spl = sub.rsplit("-", 3)
k = spl[-3],spl[-2]
d[k].append(sub)
输出:
from pprint import pprint as pp
pp(d)
{ ('128k', '0'): [ 'C:\Test3\ARRAY05-2NODE-RAID1-12disk-128k-0-segmented.xlsxC:\Test1\ARRAY05-2NODE-RAID1-17disk-128k-0-segmented.xlsx',
'C:\Test4\ARRAY05-2NODE-RAID1-25disk-128k-0-segmented.xlsx',
'C:\Test2\ARRAY05-2NODE-RAID1-18disk-128k-0-segmented.xlsx'],
('32k', '0'): [ 'C:\Test1\ARRAY05-2NODE-RAID1-12disk-32k-0-segmented.xlsx',
'C:\Test6\ARRAY05-2NODE-RAID1-25disk-32k-0-segmented.xlsx'],
('64k', '0'): ['C:\Test2\ARRAY05-2NODE-RAID1-12disk-64k-0-segmented.xlsx'],
('64k', '100'): [ 'C:\Test5\ARRAY05-2NODE-RAID1-12disk-64k-100-segmented.xlsx']}
如果你想要除磁盘部分以外的所有部分:
from collections import defaultdict
from os import path
from ntpath import basename
d = defaultdict(list)
for sub in l:
spl = basename(sub).rsplit("-", 5)
k = spl[0]+"-" + "-".join(spl[3:5])
d[k].append(sub)
输出:
{'ARRAY05-2NODE-128k-0': ['C:\Test3\ARRAY05-2NODE-RAID1-12disk-128k-0-segmented.xlsx',
'C:\Test1\ARRAY05-2NODE-RAID1-17disk-128k-0-segmented.xlsx',
'C:\Test4\ARRAY05-2NODE-RAID1-25disk-128k-0-segmented.xlsx',
'C:\Test2\ARRAY05-2NODE-RAID1-18disk-128k-0-segmented.xlsx'],
'ARRAY05-2NODE-32k-0': ['C:\Test1\ARRAY05-2NODE-RAID1-12disk-32k-0-segmented.xlsx',
'C:\Test6\ARRAY05-2NODE-RAID1-25disk-32k-0-segmented.xlsx'],
'ARRAY05-2NODE-64k-0': ['C:\Test2\ARRAY05-2NODE-RAID1-12disk-64k-0-segmented.xlsx'],
'ARRAY05-2NODE-64k-100': ['C:\Test5\ARRAY05-2NODE-RAID1-12disk-64k-100-segmented.xlsx']}