Python 查找文件中的最后一次出现
Python find last occurence in a file
我有一个不同 IP 的文件。
192.168.11.2
192.1268.11.3
192.168.11.3
192.168.11.3
192.168.11.2
192.168.11.5
到目前为止,这是我的代码。我在哪里打印 IP 和出现的位置,但是如何找出每个 IP 的最后一次出现时间。这是一种简单的方法吗?
liste = []
dit = {}
file = open('ip.txt','r')
file = file.readlines()
for line in file:
liste.append(line.strip())
for element in liste:
if element in dit:
dit[element] +=1
else:
dit[element] = 1
for key,value in dit.items():
print "%s occurs %s times, last occurence at line" %(key,value)
输出:
192.1268.11.3 occurs 1 times, last occurence at line
192.168.11.3 occurs 2 times, last occurence at line
192.168.11.2 occurs 2 times, last occurence at line
192.168.11.5 occurs 1 times, last occurence at line
您可以使用其他词典。在此字典中,您为每一行存储最后一次出现的行号,并在每次找到另一次出现时覆盖。最后,在这本词典中,每一行都有最后一次出现的行号。
显然,您需要为每个阅读行增加一个计数器,以便知道您现在正在阅读的行。
试试这个:
liste = []
dit = {}
file = open('ip.txt','r')
file = file.readlines()
for line in file:
liste.append(line.strip())
for i, element in enumerate(liste, 1):
if element in dit:
dit[element][0] += 1
dit[element][1] = i
else:
dit[element] = [1,i]
for key,value in dit.items():
print "%s occurs %d times, last occurence at line %d" % (key, value[0], value[1])
last_line_occurrence = {}
for element, line_number in zip(liste, range(1, len(liste)+1)):
if element in dit:
dit[element] +=1
else:
dit[element] = 1
last_line_occurrence[element] = line_number
for key,value in dit.items():
print "%s occurs %s times, last occurence at line %s" %(key,value, last_line_occurrence[key])
这是一个解决方案:
from collections import Counter
with open('ip.txt') as input_file:
lines = input_file.read().splitlines()
# Find last occurrence, count
last_line = dict((ip, line_number) for line_number, ip in enumerate(lines, 1))
ip_count = Counter(lines)
# Print the stat, sorted by last occurrence
for ip in sorted(last_line, key=lambda k: last_line[k]):
print '{} occurs {} times, last occurence at line {}'.format(
ip, ip_count[ip], last_line[ip])
讨论
- 我使用
enumerate
函数生成行号(从第1行开始)
- 有了(ip, line_number)的序列,很容易生成字典
last_line
,其中键是IP地址,值是它出现的最后一行
- 为了计算出现的次数,我使用
Counter
class——非常简单
- 如果您希望报告按 IP 地址排序,请使用
sorted(last_line)
- 此解决方案对性能有影响:它扫描 IP 列表两次:一次计算
last_line
,一次计算 ip_count
。这意味着如果文件很大,这个解决方案可能并不理想
无需将所有文件读入内存即可轻松完成此操作:
from collections import defaultdict
d = defaultdict(lambda: {"ind":0,"count":0})
with open("in.txt") as f:
for ind, line in enumerate(f,1):
ip = line.rstrip()
d[ip]["ind"] = ind
d[ip]["count"] += 1
for ip ,v in d.items():
print("IP {} appears {} time(s) and the last occurrence is at line {}".format(ip,v["count"],v["ind"]))
输出:
IP 192.1268.11.3 appears 1 time(s) and the last occurrence is at line 2
IP 192.168.11.3 appears 2 time(s) and the last occurrence is at line 4
IP 192.168.11.2 appears 2 time(s) and the last occurrence is at line 5
IP 192.168.11.5 appears 1 time(s) and the last occurrence is at line 6
如果您想要第一次遇到 ip 的顺序,请使用 OrderedDict:
from collections import OrderedDict
od = OrderedDict()
with open("in.txt") as f:
for ind, line in enumerate(f,1):
ip = line.rstrip()
od.setdefault(ip, {"ind": 0,"count":0})
od[ip]["ind"] = ind
od[ip]["count"] += 1
for ip ,v in od.items():
print("IP {} appears {} time(s) and the last occurrence is at line {}".format(ip,v["count"],v["ind"]))
输出:
IP 192.168.11.2 appears 2 time(s) and the last occurrence is at line 5
IP 192.1268.11.3 appears 1 time(s) and the last occurrence is at line 2
IP 192.168.11.3 appears 2 time(s) and the last occurrence is at line 4
IP 192.168.11.5 appears 1 time(s) and the last occurrence is at line 6
我有一个不同 IP 的文件。
192.168.11.2
192.1268.11.3
192.168.11.3
192.168.11.3
192.168.11.2
192.168.11.5
到目前为止,这是我的代码。我在哪里打印 IP 和出现的位置,但是如何找出每个 IP 的最后一次出现时间。这是一种简单的方法吗?
liste = []
dit = {}
file = open('ip.txt','r')
file = file.readlines()
for line in file:
liste.append(line.strip())
for element in liste:
if element in dit:
dit[element] +=1
else:
dit[element] = 1
for key,value in dit.items():
print "%s occurs %s times, last occurence at line" %(key,value)
输出:
192.1268.11.3 occurs 1 times, last occurence at line
192.168.11.3 occurs 2 times, last occurence at line
192.168.11.2 occurs 2 times, last occurence at line
192.168.11.5 occurs 1 times, last occurence at line
您可以使用其他词典。在此字典中,您为每一行存储最后一次出现的行号,并在每次找到另一次出现时覆盖。最后,在这本词典中,每一行都有最后一次出现的行号。
显然,您需要为每个阅读行增加一个计数器,以便知道您现在正在阅读的行。
试试这个:
liste = []
dit = {}
file = open('ip.txt','r')
file = file.readlines()
for line in file:
liste.append(line.strip())
for i, element in enumerate(liste, 1):
if element in dit:
dit[element][0] += 1
dit[element][1] = i
else:
dit[element] = [1,i]
for key,value in dit.items():
print "%s occurs %d times, last occurence at line %d" % (key, value[0], value[1])
last_line_occurrence = {}
for element, line_number in zip(liste, range(1, len(liste)+1)):
if element in dit:
dit[element] +=1
else:
dit[element] = 1
last_line_occurrence[element] = line_number
for key,value in dit.items():
print "%s occurs %s times, last occurence at line %s" %(key,value, last_line_occurrence[key])
这是一个解决方案:
from collections import Counter
with open('ip.txt') as input_file:
lines = input_file.read().splitlines()
# Find last occurrence, count
last_line = dict((ip, line_number) for line_number, ip in enumerate(lines, 1))
ip_count = Counter(lines)
# Print the stat, sorted by last occurrence
for ip in sorted(last_line, key=lambda k: last_line[k]):
print '{} occurs {} times, last occurence at line {}'.format(
ip, ip_count[ip], last_line[ip])
讨论
- 我使用
enumerate
函数生成行号(从第1行开始) - 有了(ip, line_number)的序列,很容易生成字典
last_line
,其中键是IP地址,值是它出现的最后一行 - 为了计算出现的次数,我使用
Counter
class——非常简单 - 如果您希望报告按 IP 地址排序,请使用
sorted(last_line)
- 此解决方案对性能有影响:它扫描 IP 列表两次:一次计算
last_line
,一次计算ip_count
。这意味着如果文件很大,这个解决方案可能并不理想
无需将所有文件读入内存即可轻松完成此操作:
from collections import defaultdict
d = defaultdict(lambda: {"ind":0,"count":0})
with open("in.txt") as f:
for ind, line in enumerate(f,1):
ip = line.rstrip()
d[ip]["ind"] = ind
d[ip]["count"] += 1
for ip ,v in d.items():
print("IP {} appears {} time(s) and the last occurrence is at line {}".format(ip,v["count"],v["ind"]))
输出:
IP 192.1268.11.3 appears 1 time(s) and the last occurrence is at line 2
IP 192.168.11.3 appears 2 time(s) and the last occurrence is at line 4
IP 192.168.11.2 appears 2 time(s) and the last occurrence is at line 5
IP 192.168.11.5 appears 1 time(s) and the last occurrence is at line 6
如果您想要第一次遇到 ip 的顺序,请使用 OrderedDict:
from collections import OrderedDict
od = OrderedDict()
with open("in.txt") as f:
for ind, line in enumerate(f,1):
ip = line.rstrip()
od.setdefault(ip, {"ind": 0,"count":0})
od[ip]["ind"] = ind
od[ip]["count"] += 1
for ip ,v in od.items():
print("IP {} appears {} time(s) and the last occurrence is at line {}".format(ip,v["count"],v["ind"]))
输出:
IP 192.168.11.2 appears 2 time(s) and the last occurrence is at line 5
IP 192.1268.11.3 appears 1 time(s) and the last occurrence is at line 2
IP 192.168.11.3 appears 2 time(s) and the last occurrence is at line 4
IP 192.168.11.5 appears 1 time(s) and the last occurrence is at line 6