使用 set() 从列表中删除重复用户
Removing duplicate users from a list using set()
正在尝试从 python 中设置的列表中删除重复用户。问题是它没有删除重复的用户:
with open ('live.txt') as file:
for line in file.readlines():
word = line.split()
users = (word[word.index('user')+1])
l = users.split()
l = set(l)
l = sorted(l)
print " ".join(l)
这里是live.txt
的内容:
Sep 15 04:34:24 li146-252 sshd[13320]: Failed password for invalid user ronda from 212.58.111.170 port 42201 ssh2
Sep 15 04:34:26 li146-252 sshd[13322]: Failed password for invalid user ronda from 212.58.111.170 port 42330 ssh2
Sep 15 04:34:28 li146-252 sshd[13324]: Failed password for invalid user ronda from 212.58.111.170 port 42454 ssh2
Sep 15 04:34:31 li146-252 sshd[13326]: Failed password for invalid user ronda from 212.58.111.170 port 42579 ssh2
Sep 15 04:34:33 li146-252 sshd[13328]: Failed password for invalid user romero from 212.58.111.170 port 42715 ssh2
Sep 15 04:34:36 li146-252 sshd[13330]: Failed password for invalid user romero from 212.58.111.170 port 42838 ssh2
您可以尝试更简单的方法
list(set(<Your user list>))
这将 return 列出没有重复项。 Python 的数据类型 set
是唯一元素的集合。因此,只需将 list
类型转换为 set
即可自动删除重复项
示例:
>>> users = ['john', 'mike', 'ross', 'john','obama','mike']
>>> list(set(users))
['mike', 'john', 'obama', 'ross']
>>>
希望这能解决您的问题:
import re
def remove_me():
all_users = []
with open ('live.txt') as file:
for line in file.readlines():
pattern = re.compile('(.*user\s*)([a-zA-Z0-9]*)')
stmt = pattern.match(line)
all_users.append(stmt.groups()[1])
unique_users = list(set(all_users))
print unique_users
if __name__ == "__main__":
remove_me()
这是你想要的代码:
with open ('live.txt') as file:
users = []
for line in file.readlines():
word = line.split()
users.append(word[word.index('user') + 1])
unique_users = list(set(users))
print " ".join(unique_users)
输出:
romero ronda
如果重复的用户线路是连续的;您可以使用 itertools.groupby()
删除重复项:
#!/usr/bin/env python
from itertools import groupby
from operator import itemgetter
def extract_user(line):
return line.partition('user')[2].partition('from')[0].strip()
with open('live.txt') as file:
print(" ".join(map(itemgetter(0), groupby(file, key=extract_user))))
# -> ronda romero
正在尝试从 python 中设置的列表中删除重复用户。问题是它没有删除重复的用户:
with open ('live.txt') as file:
for line in file.readlines():
word = line.split()
users = (word[word.index('user')+1])
l = users.split()
l = set(l)
l = sorted(l)
print " ".join(l)
这里是live.txt
的内容:
Sep 15 04:34:24 li146-252 sshd[13320]: Failed password for invalid user ronda from 212.58.111.170 port 42201 ssh2
Sep 15 04:34:26 li146-252 sshd[13322]: Failed password for invalid user ronda from 212.58.111.170 port 42330 ssh2
Sep 15 04:34:28 li146-252 sshd[13324]: Failed password for invalid user ronda from 212.58.111.170 port 42454 ssh2
Sep 15 04:34:31 li146-252 sshd[13326]: Failed password for invalid user ronda from 212.58.111.170 port 42579 ssh2
Sep 15 04:34:33 li146-252 sshd[13328]: Failed password for invalid user romero from 212.58.111.170 port 42715 ssh2
Sep 15 04:34:36 li146-252 sshd[13330]: Failed password for invalid user romero from 212.58.111.170 port 42838 ssh2
您可以尝试更简单的方法
list(set(<Your user list>))
这将 return 列出没有重复项。 Python 的数据类型 set
是唯一元素的集合。因此,只需将 list
类型转换为 set
即可自动删除重复项
示例:
>>> users = ['john', 'mike', 'ross', 'john','obama','mike']
>>> list(set(users))
['mike', 'john', 'obama', 'ross']
>>>
希望这能解决您的问题:
import re
def remove_me():
all_users = []
with open ('live.txt') as file:
for line in file.readlines():
pattern = re.compile('(.*user\s*)([a-zA-Z0-9]*)')
stmt = pattern.match(line)
all_users.append(stmt.groups()[1])
unique_users = list(set(all_users))
print unique_users
if __name__ == "__main__":
remove_me()
这是你想要的代码:
with open ('live.txt') as file:
users = []
for line in file.readlines():
word = line.split()
users.append(word[word.index('user') + 1])
unique_users = list(set(users))
print " ".join(unique_users)
输出:
romero ronda
如果重复的用户线路是连续的;您可以使用 itertools.groupby()
删除重复项:
#!/usr/bin/env python
from itertools import groupby
from operator import itemgetter
def extract_user(line):
return line.partition('user')[2].partition('from')[0].strip()
with open('live.txt') as file:
print(" ".join(map(itemgetter(0), groupby(file, key=extract_user))))
# -> ronda romero