如何从字典列表中过滤并写入文件?
How to filter from a list of dictionaries and write to a file?
我想对我之前的问题做一些改进:
打印(tag_list)
[[{'script': [{'domain': 'random.com', 'path': 'js/custom.js'}]},
{'script': [{'domain': 'cdnjs.cloudflare.com',
'path': '/ajax/libs/fancybox/2.1.5/jquery.fancybox.min.js'}]},
{'link': [{'domain': 'random.com', 'path': 'css/bootstrap.min.css'}]},
{'link': [{'domain': 'random.com', 'path': 'css/style.css'}]},
{'link': [{'domain': 'random.com', 'path': 'css/responsive.css'}]},
{'link': [{'domain': 'random.com',
'path': 'css/jquery.mCustomScrollbar.min.css'}]},
{'link': [{'domain': 'netdna.bootstrapcdn.com',
'path': '/font-awesome/4.0.3/css/font-awesome.css'}]}]]
我想获取 'domain'
键中的所有数据并将它们逐行存储在新文件 domain.txt
中。
domain.txt
random.com
cdnjs.cloudfare.com
netdna.bootstrapcdn.com
应该避免重复。
一种方法:
data = [[{'script': [{'domain': 'random.com', 'path': 'js/custom.js'}]},
{'script': [{'domain': 'cdnjs.cloudflare.com',
'path': '/ajax/libs/fancybox/2.1.5/jquery.fancybox.min.js'}]},
{'link': [{'domain': 'random.com', 'path': 'css/bootstrap.min.css'}]},
{'link': [{'domain': 'random.com', 'path': 'css/style.css'}]},
{'link': [{'domain': 'random.com', 'path': 'css/responsive.css'}]},
{'link': [{'domain': 'random.com',
'path': 'css/jquery.mCustomScrollbar.min.css'}]},
{'link': [{'domain': 'netdna.bootstrapcdn.com',
'path': '/font-awesome/4.0.3/css/font-awesome.css'}]}]]
# open file for writing
with open("domain.txt", "w") as outfile:
# create a set to check for duplicates
seen = set()
for top in data:
for e in top:
# get domain data either from script or link
se = e.get("script") or e.get("link")
# fetch the domain name
domain = se[0]["domain"]
# write if not previously seen
if domain not in seen:
seen.add(domain)
outfile.write(f"{domain}\n")
输出
random.com
cdnjs.cloudflare.com
netdna.bootstrapcdn.com
看起来似乎不需要大量的内部列表,但如果您真的需要它们,那么这应该可以处理所有可能发生的情况:
taglist = [[{'script': [{'domain': 'random.com', 'path': 'js/custom.js'}]},
{'script': [{'domain': 'cdnjs.cloudflare.com',
'path': '/ajax/libs/fancybox/2.1.5/jquery.fancybox.min.js'}]},
{'link': [{'domain': 'random.com', 'path': 'css/bootstrap.min.css'}]},
{'link': [{'domain': 'random.com', 'path': 'css/style.css'}]},
{'link': [{'domain': 'random.com', 'path': 'css/responsive.css'}]},
{'link': [{'domain': 'random.com',
'path': 'css/jquery.mCustomScrollbar.min.css'}]},
{'link': [{'domain': 'netdna.bootstrapcdn.com',
'path': '/font-awesome/4.0.3/css/font-awesome.css'}]}]]
D = set()
with open('domain.txt', 'w') as dfile:
for tag in taglist:
for subtag in tag:
if (d := subtag.get('script', None)) is None:
if (d := subtag.get('link', None)) is None:
continue
for e in d:
if (domain := e.get('domain', None)):
D.add(domain)
for domain in D:
print(domain, file=dfile)
[注意:您需要 Python 3.8+ ]
我想对我之前的问题做一些改进:
打印(tag_list)
[[{'script': [{'domain': 'random.com', 'path': 'js/custom.js'}]},
{'script': [{'domain': 'cdnjs.cloudflare.com',
'path': '/ajax/libs/fancybox/2.1.5/jquery.fancybox.min.js'}]},
{'link': [{'domain': 'random.com', 'path': 'css/bootstrap.min.css'}]},
{'link': [{'domain': 'random.com', 'path': 'css/style.css'}]},
{'link': [{'domain': 'random.com', 'path': 'css/responsive.css'}]},
{'link': [{'domain': 'random.com',
'path': 'css/jquery.mCustomScrollbar.min.css'}]},
{'link': [{'domain': 'netdna.bootstrapcdn.com',
'path': '/font-awesome/4.0.3/css/font-awesome.css'}]}]]
我想获取 'domain'
键中的所有数据并将它们逐行存储在新文件 domain.txt
中。
domain.txt
random.com
cdnjs.cloudfare.com
netdna.bootstrapcdn.com
应该避免重复。
一种方法:
data = [[{'script': [{'domain': 'random.com', 'path': 'js/custom.js'}]},
{'script': [{'domain': 'cdnjs.cloudflare.com',
'path': '/ajax/libs/fancybox/2.1.5/jquery.fancybox.min.js'}]},
{'link': [{'domain': 'random.com', 'path': 'css/bootstrap.min.css'}]},
{'link': [{'domain': 'random.com', 'path': 'css/style.css'}]},
{'link': [{'domain': 'random.com', 'path': 'css/responsive.css'}]},
{'link': [{'domain': 'random.com',
'path': 'css/jquery.mCustomScrollbar.min.css'}]},
{'link': [{'domain': 'netdna.bootstrapcdn.com',
'path': '/font-awesome/4.0.3/css/font-awesome.css'}]}]]
# open file for writing
with open("domain.txt", "w") as outfile:
# create a set to check for duplicates
seen = set()
for top in data:
for e in top:
# get domain data either from script or link
se = e.get("script") or e.get("link")
# fetch the domain name
domain = se[0]["domain"]
# write if not previously seen
if domain not in seen:
seen.add(domain)
outfile.write(f"{domain}\n")
输出
random.com
cdnjs.cloudflare.com
netdna.bootstrapcdn.com
看起来似乎不需要大量的内部列表,但如果您真的需要它们,那么这应该可以处理所有可能发生的情况:
taglist = [[{'script': [{'domain': 'random.com', 'path': 'js/custom.js'}]},
{'script': [{'domain': 'cdnjs.cloudflare.com',
'path': '/ajax/libs/fancybox/2.1.5/jquery.fancybox.min.js'}]},
{'link': [{'domain': 'random.com', 'path': 'css/bootstrap.min.css'}]},
{'link': [{'domain': 'random.com', 'path': 'css/style.css'}]},
{'link': [{'domain': 'random.com', 'path': 'css/responsive.css'}]},
{'link': [{'domain': 'random.com',
'path': 'css/jquery.mCustomScrollbar.min.css'}]},
{'link': [{'domain': 'netdna.bootstrapcdn.com',
'path': '/font-awesome/4.0.3/css/font-awesome.css'}]}]]
D = set()
with open('domain.txt', 'w') as dfile:
for tag in taglist:
for subtag in tag:
if (d := subtag.get('script', None)) is None:
if (d := subtag.get('link', None)) is None:
continue
for e in d:
if (domain := e.get('domain', None)):
D.add(domain)
for domain in D:
print(domain, file=dfile)
[注意:您需要 Python 3.8+ ]