给定一个 url 列表,打印出前 3 个频繁出现的文件名
Given a list of urls, print out the top 3 frequent filenames
给定一个 url 列表,打印出最常用的 3 个文件名。
url = [
"http://www.google.com/a.txt",
"http://www.google.com.tw/a.txt",
"http://www.google.com/download/c.jpg",
"http://www.google.co.jp/a.txt",
"http://www.google.com/b.txt",
"http://facebook.com/movie/b.txt",
"http://yahoo.com/123/000/c.jpg",
"http://gliacloud.com/haha.png",
]
程序应该打印出来
a.txt 3
b.txt 2
c.jpg 2
如何使用 re
和 collections
,它们提供 Counter
和 most_common
来提取您的前 n 个匹配!
import re
from collections import Counter
pattern = re.compile(r"\w+\.\w+$")
Counter(re.findall(pattern,u)[0] for u in url).most_common(3)
输出:
[('a.txt', 3), ('c.jpg', 2), ('b.txt', 2)]
您可以使用来自集合的Counter
:
from collections import Counter
res = [a.rsplit('/', 1)[-1] for a in url]
print (Counter(res))
#Counter({'a.txt': 3, 'c.jpg': 2, 'b.txt': 2, 'haha.png': 1})
输出:
Counter({'a.txt': 3, 'c.jpg': 2, 'b.txt': 2, 'haha.png': 1})
Link:
https://docs.python.org/3.1/library/collections.html
更新:
OP 询问前 3 名:
import collections
kk = [a.rsplit('/', 1)[-1] for a in url]
print (collections.Counter(kk).most_common(3))
# [('a.txt', 3), ('c.jpg', 2), ('b.txt', 2)]
collections.Counter
和前 3 和 counter.most_common(3)
怎么样?
import collections
url = [
"http://www.google.com/a.txt",
"http://www.google.com.tw/a.txt",
"http://www.google.com/download/c.jpg",
"http://www.google.co.jp/a.txt",
"http://www.google.com/b.txt",
"http://facebook.com/movie/b.txt",
"http://yahoo.com/123/000/c.jpg",
"http://gliacloud.com/haha.png",
]
splited_url = [i.split('/')[-1] for i in url]
counter = collections.Counter(splited_url)
counter = counter.most_common(3)
for p in counter:
print('{} {}'.format(p[0], p[1]))
给定一个 url 列表,打印出最常用的 3 个文件名。
url = [
"http://www.google.com/a.txt",
"http://www.google.com.tw/a.txt",
"http://www.google.com/download/c.jpg",
"http://www.google.co.jp/a.txt",
"http://www.google.com/b.txt",
"http://facebook.com/movie/b.txt",
"http://yahoo.com/123/000/c.jpg",
"http://gliacloud.com/haha.png",
]
程序应该打印出来
a.txt 3
b.txt 2
c.jpg 2
如何使用 re
和 collections
,它们提供 Counter
和 most_common
来提取您的前 n 个匹配!
import re
from collections import Counter
pattern = re.compile(r"\w+\.\w+$")
Counter(re.findall(pattern,u)[0] for u in url).most_common(3)
输出:
[('a.txt', 3), ('c.jpg', 2), ('b.txt', 2)]
您可以使用来自集合的Counter
:
from collections import Counter
res = [a.rsplit('/', 1)[-1] for a in url]
print (Counter(res))
#Counter({'a.txt': 3, 'c.jpg': 2, 'b.txt': 2, 'haha.png': 1})
输出:
Counter({'a.txt': 3, 'c.jpg': 2, 'b.txt': 2, 'haha.png': 1})
Link:
https://docs.python.org/3.1/library/collections.html
更新:
OP 询问前 3 名:
import collections
kk = [a.rsplit('/', 1)[-1] for a in url]
print (collections.Counter(kk).most_common(3))
# [('a.txt', 3), ('c.jpg', 2), ('b.txt', 2)]
collections.Counter
和前 3 和 counter.most_common(3)
怎么样?
import collections
url = [
"http://www.google.com/a.txt",
"http://www.google.com.tw/a.txt",
"http://www.google.com/download/c.jpg",
"http://www.google.co.jp/a.txt",
"http://www.google.com/b.txt",
"http://facebook.com/movie/b.txt",
"http://yahoo.com/123/000/c.jpg",
"http://gliacloud.com/haha.png",
]
splited_url = [i.split('/')[-1] for i in url]
counter = collections.Counter(splited_url)
counter = counter.most_common(3)
for p in counter:
print('{} {}'.format(p[0], p[1]))