使用 Counter 的列表中的字典
Dictionary within a list using Counter
我想编写一个函数,列出在所有其他词典中至少出现 df 次的词典项的计数器。
示例:
prune(([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 1})]
prune(([{'a': 1, 'b': 10}, {'a': 2}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 2})]
正如我们所看到的,'a' 在两个词典中出现了两次,它被列在输出中。
我的做法:
from collections import Counter
def prune(dicto,df=2):
new = Counter()
for d in dicto:
new += Counter(d.keys())
x = {}
for key,value in new.items():
if value >= df:
x[key] = value
print Counter(x)
输出:
Counter({'a': 2})
这给出了组合计数器的输出。正如我们所看到的,术语 'a' 总体上出现了 2 次,因此它满足 df 条件并在输出中列出。现在,任何人都可以纠正我以获得所需的输出。
这将打印出出现在至少 df
个词典中的每个键的所有值。
def prune(dicts, df):
counts = {}
for d in dicts: # for each dictionary
for k,v in d.items(): # for each key,value pair in the dictionary
if k not in counts: # if we haven't seen this key before
counts[k] = []
counts[k].append(v) # append this value to this key
for k,vals in counts.items():
if len(vals) < df:
continue # take only the keys that have at least `df` values (that appear in at least `df` dictionaries)
for val in vals:
print(k, ":", val)
我建议:
from collections import Counter
def prune(dicto, min_df=2):
# Create all counters
counters = [Counter(d.keys()) for d in dicto]
# Sum all counters
total = sum(counters, Counter())
# Create set with keys of high frequency
keys = set(k for k, v in total.items() if v >= min_df)
# Reconstruct counters using high frequency keys
counters = (Counter({k: v for k, v in d.items() if k in keys}) for d in dicto)
# With filter(None, ...) we take only the non empty counters.
return filter(None, counters)
结果:
>>> prune(([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 1})]
链接键并保留每个满足条件的字典中的键。
from itertools import chain
def prune(l, min_df=0):
# count how many times every key appears
count = Counter(chain.from_iterable(l))
# create Counter dicts using keys that appear at least min_df times
return filter(None,(Counter(k for k in d if count.get(k) >= min_df) for d in l))
In [14]: prune([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
Out[14]: [Counter({'a': 1}), Counter({'a': 1})]
您可以避免过滤器,但我不确定它是否会更有效率:
def prune(l, min_df=0):
count = Counter(chain.from_iterable(l))
res = []
for d in l:
cn = Counter(k for k in d if count.get(k) >= min_df)
if cn:
res.append(cn)
return res
循环非常相似:
In [31]: d = [{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}]
In [32]: d = [choice(d) for _ in range(1000)]
In [33]: timeit chain_prune_loop(d, min_df=2)
100 loops, best of 3: 5.49 ms per loop
In [34]: timeit prune(d, min_df=2)
100 loops, best of 3: 11.5 ms per loop
In [35]: timeit set_prune(d, min_df=2)
100 loops, best of 3: 13.5 ms per loop
我想编写一个函数,列出在所有其他词典中至少出现 df 次的词典项的计数器。
示例:
prune(([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 1})]
prune(([{'a': 1, 'b': 10}, {'a': 2}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 2})]
正如我们所看到的,'a' 在两个词典中出现了两次,它被列在输出中。
我的做法:
from collections import Counter
def prune(dicto,df=2):
new = Counter()
for d in dicto:
new += Counter(d.keys())
x = {}
for key,value in new.items():
if value >= df:
x[key] = value
print Counter(x)
输出:
Counter({'a': 2})
这给出了组合计数器的输出。正如我们所看到的,术语 'a' 总体上出现了 2 次,因此它满足 df 条件并在输出中列出。现在,任何人都可以纠正我以获得所需的输出。
这将打印出出现在至少 df
个词典中的每个键的所有值。
def prune(dicts, df):
counts = {}
for d in dicts: # for each dictionary
for k,v in d.items(): # for each key,value pair in the dictionary
if k not in counts: # if we haven't seen this key before
counts[k] = []
counts[k].append(v) # append this value to this key
for k,vals in counts.items():
if len(vals) < df:
continue # take only the keys that have at least `df` values (that appear in at least `df` dictionaries)
for val in vals:
print(k, ":", val)
我建议:
from collections import Counter
def prune(dicto, min_df=2):
# Create all counters
counters = [Counter(d.keys()) for d in dicto]
# Sum all counters
total = sum(counters, Counter())
# Create set with keys of high frequency
keys = set(k for k, v in total.items() if v >= min_df)
# Reconstruct counters using high frequency keys
counters = (Counter({k: v for k, v in d.items() if k in keys}) for d in dicto)
# With filter(None, ...) we take only the non empty counters.
return filter(None, counters)
结果:
>>> prune(([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 1})]
链接键并保留每个满足条件的字典中的键。
from itertools import chain
def prune(l, min_df=0):
# count how many times every key appears
count = Counter(chain.from_iterable(l))
# create Counter dicts using keys that appear at least min_df times
return filter(None,(Counter(k for k in d if count.get(k) >= min_df) for d in l))
In [14]: prune([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
Out[14]: [Counter({'a': 1}), Counter({'a': 1})]
您可以避免过滤器,但我不确定它是否会更有效率:
def prune(l, min_df=0):
count = Counter(chain.from_iterable(l))
res = []
for d in l:
cn = Counter(k for k in d if count.get(k) >= min_df)
if cn:
res.append(cn)
return res
循环非常相似:
In [31]: d = [{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}]
In [32]: d = [choice(d) for _ in range(1000)]
In [33]: timeit chain_prune_loop(d, min_df=2)
100 loops, best of 3: 5.49 ms per loop
In [34]: timeit prune(d, min_df=2)
100 loops, best of 3: 11.5 ms per loop
In [35]: timeit set_prune(d, min_df=2)
100 loops, best of 3: 13.5 ms per loop