python: 提取不同列表的项目并将它们放在一个集合中
python: extract items of different lists and put them in one set
我有这样一个文件:
93.93.203.11|["['vmit.it', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'maurominnella.com']"]
168.144.9.16|["['iipmalumni.com','webdesignhostingindia.com', 'iipmstudents.in', 'iipmclubs.in']"]
195.211.72.88|["['tcmpraktijk-jingshen.nl', 'ellen-siemer.nl'']"]
129.35.210.118|["['israelinnovation.co.il', 'watec-peru.com', 'bsacimeeting.org', 'wsava2015.com', 'picsmeeting.com']"]
我想提取所有列表中的域并将它们添加到一组中。最后,我想在一行中对每个唯一域进行罚款。这是我写的代码:
set_d = set()
f = open(file,'r')
for line in f:
line = line.strip('\n')
ip,list = line.split('|')
l = json.loads(list)
for e in l:
domain = e.split(',')
set_d.add(domain)
print set_d
但出现以下错误:
set_d.add(domain)
TypeError: unhashable type: 'list'
有人可以帮我吗?
由于 split
函数的结果是一个列表 (domain = e.split(',')
) 并且列表是不可散列的,您不能将它们添加到 set
. instead you can add those elements to your set with set.update()
,但是您不需要 Json
作为它不会分隔您的域,也不会给您想要的结果,您可以使用 ast.literal_eval
来拆分您的列表:
import ast
set_d = set()
f = open(file,'r')
for line in f:
line = line.strip('\n')
ip,li = line.split('|')
l = ast.literal_eval(ast.literal_eval(li)[0])
for e in l:
domain = e.split(',')
set_d.update(domain)
print set_d
请注意,不要使用 python 内置函数或类型作为变量!
作为一种更有效的方法,您可以使用正则表达式来抓取您的域:
f = open(file,'r').read()
import re
print set(re.findall(r'[a-zA-Z\-]+\.[a-zA-Z]+',f))
结果:
set(['vmit.it', 'tcmpraktijk-jingshen.nl', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'israelinnovation.co', 'bsacimeeting.org', 'webdesignhostingindia.com', 'iipmstudents.in', 'maurominnella.com', 'ellen-siemer.nl', 'picsmeeting.com', 'watec-peru.com', 'iipmalumni.com', 'iipmclubs.in'])
[Finished in 0.0s]
你应该调用 update
而不是 add
;
set_d.update(domain)
示例;
>>> set_d = {'a', 'b', 'c'}
>>> set_d.update(['c', 'd', 'e'])
>>> print set_d
{'a', 'b', 'c', 'd', 'e'}
使用 str.translate 清理文本并使用更新添加到集合中:
set_d = set()
with open(file,'r') as f:
for line in f:
lst = (x.strip() for x in line.split("|")[1].translate(None,"\"'[]").split(","
set_d.update(lst)
输出一组独特的独立域:
set(['vmit.it', 'tcmpraktijk-jingshen.nl', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'watec-peru.com', 'bsacimeeting.org', 'webdesignhostingindia.com', 'wsava2015.com', 'iipmstudents.in', 'maurominnella.com', 'ellen-siemer.nl', 'picsmeeting.com', 'iipmalumni.com', 'iipmclubs.in', 'israelinnovation.co.il'])
您可以将其写入新文件:
set_d = set()
with open(file,'r') as f,open("out.txt","w") as out:
for line in f:
lst = (x.strip() for x in line.split("|")[1].translate(None,"\"'[]").split(","))
set_d.update(lst)
for line in set_d:
out.write("{}\n".format(line))
输出:
$ cat out.txt
vmit.it
tcmpraktijk-jingshen.nl
umbertominnella.it
studioguizzardi.it
telestreet.it
watec-peru.com
bsacimeeting.org
webdesignhostingindia.com
wsava2015.com
iipmstudents.in
maurominnella.com
ellen-siemer.nl
picsmeeting.com
iipmalumni.com
iipmclubs.in
israelinnovation.co.il
您的代码不会分成单独的域,您的 json 调用实际上没有任何帮助。将代码更改为 update 将输出如下内容:
{" 'maurominnella.com']", " 'wsava2015.com'", "'webdesignhostingindia.com'", " 'iipmclubs.in']", " 'ellen-siemer.nl'']", " 'umbertominnella.it'", " 'picsmeeting.com']", "['israelinnovation.co.il'", "['vmit.it'", " 'iipmstudents.in'", "['tcmpraktijk-jingshen.nl'", " 'studioguizzardi.it'", "['iipmalumni.com'", " 'watec-peru.com'", " 'bsacimeeting.org'", " 'telestreet.it'"}
也不要使用 list 作为变量名,否则它会遮盖 python list
我有这样一个文件:
93.93.203.11|["['vmit.it', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'maurominnella.com']"]
168.144.9.16|["['iipmalumni.com','webdesignhostingindia.com', 'iipmstudents.in', 'iipmclubs.in']"]
195.211.72.88|["['tcmpraktijk-jingshen.nl', 'ellen-siemer.nl'']"]
129.35.210.118|["['israelinnovation.co.il', 'watec-peru.com', 'bsacimeeting.org', 'wsava2015.com', 'picsmeeting.com']"]
我想提取所有列表中的域并将它们添加到一组中。最后,我想在一行中对每个唯一域进行罚款。这是我写的代码:
set_d = set()
f = open(file,'r')
for line in f:
line = line.strip('\n')
ip,list = line.split('|')
l = json.loads(list)
for e in l:
domain = e.split(',')
set_d.add(domain)
print set_d
但出现以下错误:
set_d.add(domain)
TypeError: unhashable type: 'list'
有人可以帮我吗?
由于 split
函数的结果是一个列表 (domain = e.split(',')
) 并且列表是不可散列的,您不能将它们添加到 set
. instead you can add those elements to your set with set.update()
,但是您不需要 Json
作为它不会分隔您的域,也不会给您想要的结果,您可以使用 ast.literal_eval
来拆分您的列表:
import ast
set_d = set()
f = open(file,'r')
for line in f:
line = line.strip('\n')
ip,li = line.split('|')
l = ast.literal_eval(ast.literal_eval(li)[0])
for e in l:
domain = e.split(',')
set_d.update(domain)
print set_d
请注意,不要使用 python 内置函数或类型作为变量!
作为一种更有效的方法,您可以使用正则表达式来抓取您的域:
f = open(file,'r').read()
import re
print set(re.findall(r'[a-zA-Z\-]+\.[a-zA-Z]+',f))
结果:
set(['vmit.it', 'tcmpraktijk-jingshen.nl', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'israelinnovation.co', 'bsacimeeting.org', 'webdesignhostingindia.com', 'iipmstudents.in', 'maurominnella.com', 'ellen-siemer.nl', 'picsmeeting.com', 'watec-peru.com', 'iipmalumni.com', 'iipmclubs.in'])
[Finished in 0.0s]
你应该调用 update
而不是 add
;
set_d.update(domain)
示例;
>>> set_d = {'a', 'b', 'c'}
>>> set_d.update(['c', 'd', 'e'])
>>> print set_d
{'a', 'b', 'c', 'd', 'e'}
使用 str.translate 清理文本并使用更新添加到集合中:
set_d = set()
with open(file,'r') as f:
for line in f:
lst = (x.strip() for x in line.split("|")[1].translate(None,"\"'[]").split(","
set_d.update(lst)
输出一组独特的独立域:
set(['vmit.it', 'tcmpraktijk-jingshen.nl', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'watec-peru.com', 'bsacimeeting.org', 'webdesignhostingindia.com', 'wsava2015.com', 'iipmstudents.in', 'maurominnella.com', 'ellen-siemer.nl', 'picsmeeting.com', 'iipmalumni.com', 'iipmclubs.in', 'israelinnovation.co.il'])
您可以将其写入新文件:
set_d = set()
with open(file,'r') as f,open("out.txt","w") as out:
for line in f:
lst = (x.strip() for x in line.split("|")[1].translate(None,"\"'[]").split(","))
set_d.update(lst)
for line in set_d:
out.write("{}\n".format(line))
输出:
$ cat out.txt
vmit.it
tcmpraktijk-jingshen.nl
umbertominnella.it
studioguizzardi.it
telestreet.it
watec-peru.com
bsacimeeting.org
webdesignhostingindia.com
wsava2015.com
iipmstudents.in
maurominnella.com
ellen-siemer.nl
picsmeeting.com
iipmalumni.com
iipmclubs.in
israelinnovation.co.il
您的代码不会分成单独的域,您的 json 调用实际上没有任何帮助。将代码更改为 update 将输出如下内容:
{" 'maurominnella.com']", " 'wsava2015.com'", "'webdesignhostingindia.com'", " 'iipmclubs.in']", " 'ellen-siemer.nl'']", " 'umbertominnella.it'", " 'picsmeeting.com']", "['israelinnovation.co.il'", "['vmit.it'", " 'iipmstudents.in'", "['tcmpraktijk-jingshen.nl'", " 'studioguizzardi.it'", "['iipmalumni.com'", " 'watec-peru.com'", " 'bsacimeeting.org'", " 'telestreet.it'"}
也不要使用 list 作为变量名,否则它会遮盖 python list