Python 对文件中的唯一名称进行排序和计数
Python to sort and count the uniq names from the file
我正在尝试读取 Linux /var/log/messages
中的日志文件,以查找具有我在下面给出的特殊字符串模式的行。从这个行模式中,我正在查看用户的电子邮件地址,例如 rajeshm@noi-rajeshm.fox.com
并使用 str.partition()
方法将其分为两部分作为列表索引,并采用第一部分进一步将其拆分为一个列表,以便于获取最后一个索引值,即用户 ID,并且工作正常。
说我能够获取用户列表和总计数,但我需要计算每个用户的出现次数并打印 user_name: Count
,因此键和值。
Nov 28 09:00:08 foxopt210 rshd[6157]: pam_rhosts(rsh:auth): allowed
access to rajeshm@noi-rajeshm.fox.com as rajeshm
#!/usr/bin/python3
f= open("/var/log/messages")
count = 0
for line in f:
if "allowed access" in line:
count+=1
user_id = line.partition('@')[0]
user_id = user_id.split()[-1]
print(user_id)
f.close()
print("--------------------")
print("Total Count :" ,count)
当前代码如下:
bash-4.1$ ./log.py | tail
navit
akaul
akaul
pankaja
vishalm
vishalm
rajeshm
rajeshm
--------------------
Total Count : 790
在谷歌搜索时,我想到了为此使用字典
目的并且它按预期工作:
#!/usr/bin/python3
from collections import Counter
f= open("/var/log/messages")
count = 0
dictionary = {}
for line in f:
if "allowed access" in line:
user_id = line.partition('@')[0]
user_count = user_id.split()[-1]
if user_count in dictionary:
dictionary[user_count] += 1
else:
dictionary[user_count] = 1
for user_count, occurences in dictionary.items():
print(user_count, ':', occurences)
并且我的输出符合要求:
bash-4.1$ ./log2.py
rajeshm : 5
navit : 780
akaul : 2
pankaja : 1
vishalm : 2
我只是想看看是否有更好的方法来完成这个练习。
数数时,使用 collections.Counter()
class 更容易。我将在此处将解析行封装到生成器中:
def users_accessed(fileobj):
for line in fileobj:
if 'allowed access' in line:
yield line.partition('@')[0].rsplit(None, 1)[-1]
并将其传递给 Counter()
对象:
from collections import Counter
with open("/var/log/messages") as f:
access_counts = Counter(users_accessed(f))
for userid, count in access_counts.most_common():
print(userid, count, sep=':')
这使用 Counter.most_common()
method 提供排序输出(最常见到最少)。
您可以尝试使用正则表达式,并且可以这样做:
import re
pattern=r'(?<=as\s)\w.+'
occurrence={}
with open("/var/log/messages") as f:
for line in f:
search=re.search(pattern,line).group()
if search not in occurrence:
occurrence[search]=1
else:
occurrence[search]=occurrence.get(search)+1
print(occurrence)
Just for fun one line logic:
import re
pattern=r'(?<=as\s)\w.+'
new={}
[new.__setitem__(re.search(pattern, line).group(), 1) if re.search(pattern, line).group() not in new else new.__setitem__(re.search(pattern, line).group(), new.get(re.search(pattern, line).group()) + 1) for line in open('legend.txt','r')]
print(new)
我正在尝试读取 Linux /var/log/messages
中的日志文件,以查找具有我在下面给出的特殊字符串模式的行。从这个行模式中,我正在查看用户的电子邮件地址,例如 rajeshm@noi-rajeshm.fox.com
并使用 str.partition()
方法将其分为两部分作为列表索引,并采用第一部分进一步将其拆分为一个列表,以便于获取最后一个索引值,即用户 ID,并且工作正常。
说我能够获取用户列表和总计数,但我需要计算每个用户的出现次数并打印 user_name: Count
,因此键和值。
Nov 28 09:00:08 foxopt210 rshd[6157]: pam_rhosts(rsh:auth): allowed access to rajeshm@noi-rajeshm.fox.com as rajeshm
#!/usr/bin/python3
f= open("/var/log/messages")
count = 0
for line in f:
if "allowed access" in line:
count+=1
user_id = line.partition('@')[0]
user_id = user_id.split()[-1]
print(user_id)
f.close()
print("--------------------")
print("Total Count :" ,count)
当前代码如下:
bash-4.1$ ./log.py | tail
navit
akaul
akaul
pankaja
vishalm
vishalm
rajeshm
rajeshm
--------------------
Total Count : 790
在谷歌搜索时,我想到了为此使用字典 目的并且它按预期工作:
#!/usr/bin/python3
from collections import Counter
f= open("/var/log/messages")
count = 0
dictionary = {}
for line in f:
if "allowed access" in line:
user_id = line.partition('@')[0]
user_count = user_id.split()[-1]
if user_count in dictionary:
dictionary[user_count] += 1
else:
dictionary[user_count] = 1
for user_count, occurences in dictionary.items():
print(user_count, ':', occurences)
并且我的输出符合要求:
bash-4.1$ ./log2.py
rajeshm : 5
navit : 780
akaul : 2
pankaja : 1
vishalm : 2
我只是想看看是否有更好的方法来完成这个练习。
数数时,使用 collections.Counter()
class 更容易。我将在此处将解析行封装到生成器中:
def users_accessed(fileobj):
for line in fileobj:
if 'allowed access' in line:
yield line.partition('@')[0].rsplit(None, 1)[-1]
并将其传递给 Counter()
对象:
from collections import Counter
with open("/var/log/messages") as f:
access_counts = Counter(users_accessed(f))
for userid, count in access_counts.most_common():
print(userid, count, sep=':')
这使用 Counter.most_common()
method 提供排序输出(最常见到最少)。
您可以尝试使用正则表达式,并且可以这样做:
import re
pattern=r'(?<=as\s)\w.+'
occurrence={}
with open("/var/log/messages") as f:
for line in f:
search=re.search(pattern,line).group()
if search not in occurrence:
occurrence[search]=1
else:
occurrence[search]=occurrence.get(search)+1
print(occurrence)
Just for fun one line logic:
import re
pattern=r'(?<=as\s)\w.+'
new={}
[new.__setitem__(re.search(pattern, line).group(), 1) if re.search(pattern, line).group() not in new else new.__setitem__(re.search(pattern, line).group(), new.get(re.search(pattern, line).group()) + 1) for line in open('legend.txt','r')]
print(new)