尝试从 Excel 中获取一个子集

Question

我正在尝试编写一个简单的程序，只是在学习过程中努力做到这一点 Python。

我有一个 xlsx。它的格式是：

Team, Player

我想做的是对现场团队应用过滤器，然后从每个团队中随机抽取 10 名球员。

我是这样开始的：

import xlrd

# First open the workbook
wb = xlrd.open_workbook('C:\Users\ADMIN\Desktop.xlsx')

# Then select the sheet. 
sheet = wb.sheet_by_name('Sheet_1')

# Then get values of each column. Excuse first item which is header so skip that
team = sheet.col_values(0)[1:]
players = sheet.col_values(1)[1:]

但是我对如何继续这里有点困惑。

有人可以提供 feedback/advice 吗？

Answer 1

您可以使用 filter 函数 -

filtered_teams = filter(lambda x: x[0] > 2, zip(team, players))

您可以用自己的过滤器替换 lambda x: x[0] > 2，这里检查是否有任何 x[0]（或团队值）大于 2。

现在假设这里的玩家本身就是一个列表，你可以遍历 filtered_teams

import random
print '\n'.join([random.sample(players, 10) for _, players in filtered_teams])

这没有使用任何外部库，但使用 pandas 肯定会获得更好的性能。

Answer 2

您可以构建一个以球队为键的字典，其值为这些球队的球员列表，然后从这些列表中抽样：

import random

teams = {}
for t,p in zip(team,players):
    if t in teams:
        teams[t].append(p)
    else:
        teams[t] = [p]

samples = [random.sample(teams[t],10) for t in teams]

尝试从 Excel 中获取一个子集

Trying to take a subset from an Excel

python

excel

xls

xlrd

python-3.x