学习抽签分析
Lottery analysis for learning
我正在尝试学习如何使用 pandas
库。
对于数据源,我使用到目前为止的彩票组合开奖。
我要解决的众多任务之一是计算组合中数字对的出现频率。
我从列表中创建一个数据框,如下所示:
list = [
[13, 14, 28, 30, 31, 37, 39],
[7, 10, 12, 16, 21, 22, 33],
...,
[1, 2, 7, 15, 25, 31, 33],
[3, 6, 18, 21, 31, 34, 39]
]
df = pd.DataFrame(list)
print(df.head())
输出:
. 0 1 2 3 4 5 6
0 9 11 12 18 20 26 35
1 10 13 15 20 21 25 35
2 1 8 17 21 22 27 34
3 10 13 17 18 21 29 37
4 5 8 12 17 19 21 37
例如,结果我想得到两个或三个数字的元组在组合中出现的时间总和:
Pair : Found n time in all combinations
9,23 : 33
11,32 : 26
你能给我一些指导或示例如何解决这个任务吗?
这是一个仅使用标准库中的模块的简单解决方案:
from itertools import combinations
from collections import Counter
draws = [
[13, 14, 28, 30, 31, 37, 39],
[7, 10, 12, 16, 21, 22, 33],
[1, 2, 7, 15, 25, 31, 33],
[3, 6, 18, 21, 31, 34, 39]
]
duos = Counter()
trios = Counter()
for draw in draws:
duos.update(combinations(draw, 2))
trios.update(combinations(draw, 3))
print('Top 5 duos')
for x in duos.most_common(5):
print(f'{x[0]}: {x[1]}')
print()
print('Top 5 trios')
for x in trios.most_common(5):
print(f'{x[0]}: {x[1]}')
上面的代码片段将产生以下输出:
Top 5 duos
(31, 39): 2
(7, 33): 2
(13, 14): 1
(13, 28): 1
(13, 30): 1
Top 5 trios
(13, 14, 28): 1
(13, 14, 30): 1
(13, 14, 31): 1
(13, 14, 37): 1
(13, 14, 39): 1
这里是稍微优雅的版本:
from itertools import combinations
from collections import Counter
draws = [
[13, 14, 28, 30, 31, 37, 39],
[7, 10, 12, 16, 21, 22, 33],
[1, 2, 7, 15, 25, 31, 33],
[3, 6, 18, 21, 31, 34, 39]
]
counters = [Counter() for _ in range(3)]
for n, counter in enumerate(counters, 2):
for draw in draws:
counter.update(combinations(draw, n))
print(f'Top 10 combos of {n} numbers')
for combo, count in counter.most_common(10):
print(' '.join((f'{_:2d}' for _ in combo)), count, sep=': ')
print()
这将为我们提供以下输出:
Top 10 combos of 2 numbers
31 39: 2
7 33: 2
13 14: 1
13 28: 1
13 30: 1
13 31: 1
13 37: 1
13 39: 1
14 28: 1
14 30: 1
Top 10 combos of 3 numbers
13 14 28: 1
13 14 30: 1
13 14 31: 1
13 14 37: 1
13 14 39: 1
13 28 30: 1
13 28 31: 1
13 28 37: 1
13 28 39: 1
13 30 31: 1
Top 10 combos of 4 numbers
13 14 28 30: 1
13 14 28 31: 1
13 14 28 37: 1
13 14 28 39: 1
13 14 30 31: 1
13 14 30 37: 1
13 14 30 39: 1
13 14 31 37: 1
13 14 31 39: 1
13 14 37 39: 1
IIUC,您可以找到每一行的所有组合(例如两个值的组合),然后简单地计算:
from itertools import combinations
(df.apply(lambda x: tuple(combinations(x, r=2)), axis=1)
.explode()
.value_counts()
.sort_values(ascending=False))
pandas 系列的结果如下:
(31, 39) 2
(7, 33) 2
(13, 28) 1
(37, 39) 1
(13, 30) 1
..
更改 r=2
参数以组合 3 个等值。
这是一个one-liner:
from itertools import chain, combinations
from collections import Counter
lottery = [np.random.randint(1,100, size=6) for _ in range(1000)]
def commmon_combs(matrix, n_common, combs_r):
return Counter(chain(*[combinations(lottery[i], combs_r) for i in range(len(lottery))])).most_common(n_common)
commmon_combs(lottery, 5, 2)
Output:
[((78, 21), 36),
((13, 67), 35),
((22, 86), 34),
((29, 61), 34),
((19, 99), 34)]
我正在尝试学习如何使用 pandas
库。
对于数据源,我使用到目前为止的彩票组合开奖。
我要解决的众多任务之一是计算组合中数字对的出现频率。
我从列表中创建一个数据框,如下所示:
list = [
[13, 14, 28, 30, 31, 37, 39],
[7, 10, 12, 16, 21, 22, 33],
...,
[1, 2, 7, 15, 25, 31, 33],
[3, 6, 18, 21, 31, 34, 39]
]
df = pd.DataFrame(list)
print(df.head())
输出:
. 0 1 2 3 4 5 6
0 9 11 12 18 20 26 35
1 10 13 15 20 21 25 35
2 1 8 17 21 22 27 34
3 10 13 17 18 21 29 37
4 5 8 12 17 19 21 37
例如,结果我想得到两个或三个数字的元组在组合中出现的时间总和:
Pair : Found n time in all combinations
9,23 : 33
11,32 : 26
你能给我一些指导或示例如何解决这个任务吗?
这是一个仅使用标准库中的模块的简单解决方案:
from itertools import combinations
from collections import Counter
draws = [
[13, 14, 28, 30, 31, 37, 39],
[7, 10, 12, 16, 21, 22, 33],
[1, 2, 7, 15, 25, 31, 33],
[3, 6, 18, 21, 31, 34, 39]
]
duos = Counter()
trios = Counter()
for draw in draws:
duos.update(combinations(draw, 2))
trios.update(combinations(draw, 3))
print('Top 5 duos')
for x in duos.most_common(5):
print(f'{x[0]}: {x[1]}')
print()
print('Top 5 trios')
for x in trios.most_common(5):
print(f'{x[0]}: {x[1]}')
上面的代码片段将产生以下输出:
Top 5 duos
(31, 39): 2
(7, 33): 2
(13, 14): 1
(13, 28): 1
(13, 30): 1
Top 5 trios
(13, 14, 28): 1
(13, 14, 30): 1
(13, 14, 31): 1
(13, 14, 37): 1
(13, 14, 39): 1
这里是稍微优雅的版本:
from itertools import combinations
from collections import Counter
draws = [
[13, 14, 28, 30, 31, 37, 39],
[7, 10, 12, 16, 21, 22, 33],
[1, 2, 7, 15, 25, 31, 33],
[3, 6, 18, 21, 31, 34, 39]
]
counters = [Counter() for _ in range(3)]
for n, counter in enumerate(counters, 2):
for draw in draws:
counter.update(combinations(draw, n))
print(f'Top 10 combos of {n} numbers')
for combo, count in counter.most_common(10):
print(' '.join((f'{_:2d}' for _ in combo)), count, sep=': ')
print()
这将为我们提供以下输出:
Top 10 combos of 2 numbers
31 39: 2
7 33: 2
13 14: 1
13 28: 1
13 30: 1
13 31: 1
13 37: 1
13 39: 1
14 28: 1
14 30: 1
Top 10 combos of 3 numbers
13 14 28: 1
13 14 30: 1
13 14 31: 1
13 14 37: 1
13 14 39: 1
13 28 30: 1
13 28 31: 1
13 28 37: 1
13 28 39: 1
13 30 31: 1
Top 10 combos of 4 numbers
13 14 28 30: 1
13 14 28 31: 1
13 14 28 37: 1
13 14 28 39: 1
13 14 30 31: 1
13 14 30 37: 1
13 14 30 39: 1
13 14 31 37: 1
13 14 31 39: 1
13 14 37 39: 1
IIUC,您可以找到每一行的所有组合(例如两个值的组合),然后简单地计算:
from itertools import combinations
(df.apply(lambda x: tuple(combinations(x, r=2)), axis=1)
.explode()
.value_counts()
.sort_values(ascending=False))
pandas 系列的结果如下:
(31, 39) 2
(7, 33) 2
(13, 28) 1
(37, 39) 1
(13, 30) 1
..
更改 r=2
参数以组合 3 个等值。
这是一个one-liner:
from itertools import chain, combinations
from collections import Counter
lottery = [np.random.randint(1,100, size=6) for _ in range(1000)]
def commmon_combs(matrix, n_common, combs_r):
return Counter(chain(*[combinations(lottery[i], combs_r) for i in range(len(lottery))])).most_common(n_common)
commmon_combs(lottery, 5, 2)
Output:
[((78, 21), 36),
((13, 67), 35),
((22, 86), 34),
((29, 61), 34),
((19, 99), 34)]