学习抽签分析

Lottery analysis for learning

我正在尝试学习如何使用 pandas 库。

对于数据源,我使用到目前为止的彩票组合开奖。

我要解决的众多任务之一是计算组合中数字对的出现频率。

我从列表中创建一个数据框,如下所示:

list = [
    [13, 14, 28, 30, 31, 37, 39],
    [7, 10, 12, 16, 21, 22, 33],
    ...,
    [1, 2, 7, 15, 25, 31, 33],
    [3, 6, 18, 21, 31, 34, 39]
]

df = pd.DataFrame(list)
print(df.head())

输出:

.   0   1   2   3   4   5   6
0   9  11  12  18  20  26  35
1  10  13  15  20  21  25  35
2   1   8  17  21  22  27  34
3  10  13  17  18  21  29  37
4   5   8  12  17  19  21  37

例如,结果我想得到两个或三个数字的元组在组合中出现的时间总和:

Pair  : Found n time in all combinations
9,23  : 33
11,32 : 26

你能给我一些指导或示例如何解决这个任务吗?

这是一个仅使用标准库中的模块的简单解决方案:

from itertools import combinations
from collections import Counter

draws = [
    [13, 14, 28, 30, 31, 37, 39],
    [7, 10, 12, 16, 21, 22, 33],
    [1, 2, 7, 15, 25, 31, 33],
    [3, 6, 18, 21, 31, 34, 39]
]

duos = Counter()
trios = Counter()

for draw in draws:
    duos.update(combinations(draw, 2))
    trios.update(combinations(draw, 3))

print('Top 5 duos')
for x in duos.most_common(5):
    print(f'{x[0]}: {x[1]}')

print()

print('Top 5 trios')
for x in trios.most_common(5):
    print(f'{x[0]}: {x[1]}')

上面的代码片段将产生以下输出:

Top 5 duos
(31, 39): 2
(7, 33): 2
(13, 14): 1
(13, 28): 1
(13, 30): 1

Top 5 trios
(13, 14, 28): 1
(13, 14, 30): 1
(13, 14, 31): 1
(13, 14, 37): 1
(13, 14, 39): 1

这里是稍微优雅的版本:

from itertools import combinations
from collections import Counter

draws = [
    [13, 14, 28, 30, 31, 37, 39],
    [7, 10, 12, 16, 21, 22, 33],
    [1, 2, 7, 15, 25, 31, 33],
    [3, 6, 18, 21, 31, 34, 39]
]

counters = [Counter() for _ in range(3)]

for n, counter in enumerate(counters, 2):
    for draw in draws:
        counter.update(combinations(draw, n))

    print(f'Top 10 combos of {n} numbers')

    for combo, count in counter.most_common(10):
        print(' '.join((f'{_:2d}' for _ in combo)), count, sep=': ')

    print()

这将为我们提供以下输出:

Top 10 combos of 2 numbers
31 39: 2
 7 33: 2
13 14: 1
13 28: 1
13 30: 1
13 31: 1
13 37: 1
13 39: 1
14 28: 1
14 30: 1

Top 10 combos of 3 numbers
13 14 28: 1
13 14 30: 1
13 14 31: 1
13 14 37: 1
13 14 39: 1
13 28 30: 1
13 28 31: 1
13 28 37: 1
13 28 39: 1
13 30 31: 1

Top 10 combos of 4 numbers
13 14 28 30: 1
13 14 28 31: 1
13 14 28 37: 1
13 14 28 39: 1
13 14 30 31: 1
13 14 30 37: 1
13 14 30 39: 1
13 14 31 37: 1
13 14 31 39: 1
13 14 37 39: 1

IIUC,您可以找到每一行的所有组合(例如两个值的组合),然后简单地计算:

from itertools import combinations

(df.apply(lambda x: tuple(combinations(x, r=2)), axis=1)
   .explode()
   .value_counts()
   .sort_values(ascending=False))

pandas 系列的结果如下:

(31, 39)    2
(7, 33)     2
(13, 28)    1
(37, 39)    1
(13, 30)    1
           ..

更改 r=2 参数以组合 3 个等值。

这是一个one-liner:

from itertools import chain, combinations
from collections import Counter

lottery = [np.random.randint(1,100, size=6) for _ in range(1000)]

def commmon_combs(matrix, n_common, combs_r):
    return Counter(chain(*[combinations(lottery[i], combs_r) for i in range(len(lottery))])).most_common(n_common)

commmon_combs(lottery, 5, 2)

Output:
[((78, 21), 36),
 ((13, 67), 35),
 ((22, 86), 34),
 ((29, 61), 34),
 ((19, 99), 34)]