Pandas 计算每台计算机连接的用户数

Pandas count how many users connected to each computer

我有一个登录数据集。我想计算有多少用户连接到每台计算机仅使用 pandas 内置函数。我需要结果数据集与原始数据集大小相同,因此每次在原始 table 中出现 1 台计算机时,它将以相同的登录次数出现在结果 table 中:

所以如果这是原来的 table:

Computer User
computer1 user1
computer1 user2
computer1 user3
computer2 user1
computer2 user1
computer3 user1
computer3 user2
computer3 user2

我希望结果table是这样的:

Computer User_Count
computer1 3
computer1 3
computer1 3
computer2 1
computer2 1
computer3 2
computer3 2
computer3 2

简单的列表对我有用:

result = []
num_of_computers = {}
for user in set(user_and_computer):
    computers = []
    for logon in user_and_computers:
        if user == logon[0]:
            computer.append(logon[1])
        num_of_computers[user] = len(computers)
for user in user_and_computer:
    result.append(num_of_computers[user[0]]

此外,我尝试在第三列(失败或成功)上计算一个条件,以仅计算成功登录:

result = []
num_of_computers = {}
for user in set(user_and_computer):
    computers = []
    for logon in user_and_computers:
        if user == logon[0] and logon[2] == 'Success':
            computer.append(logon[1])
        num_of_computers[user] = len(computers)
for user in user_and_computer:
    result.append(num_of_computers[user[0]]

在这种情况下,结果 table 仍然与原始 table 大小相同,并且只计算成功登录。如果有一台计算机所有登录失败结果table将显示这台计算机每次出现在原来的table.

还有一件事,我是 pandas、dataframes 和 tables 的新手,我想知道你如何在不使用示例的情况下描述这样的任务,比如,应该如何我命名我的问题是为了让它更笼统。

使用GroupBy.transform with DataFrameGroupBy.nunique, for count only Success rows repalce not matched User to missing values by Series.where:

print (df)
    Computer   User     Type
0  computer1  user1     Fail
1  computer1  user2  Success
2  computer1  user3     Fail
3  computer2  user1  Success
4  computer2  user1     Fail
5  computer3  user1  Success
6  computer3  user2     Fail
7  computer3  user2  Success


df['User_Count'] = df.groupby('Computer')['User'].transform('nunique')

df['User_Count_Success'] = (df['User'].where(df['Type'].eq('Success'))
                                      .groupby(df['Computer'])
                                      .transform('nunique'))
print (df)
    Computer   User     Type  User_Count  User_Count_Success
0  computer1  user1     Fail           3                   1
1  computer1  user2  Success           3                   1
2  computer1  user3     Fail           3                   1
3  computer2  user1  Success           1                   1
4  computer2  user1     Fail           1                   1
5  computer3  user1  Success           2                   2
6  computer3  user2     Fail           2                   2
7  computer3  user2  Success           2                   2

详情:

print (df['User'].where(df['Type'].eq('Success')))
0      NaN
1    user2
2      NaN
3    user1
4      NaN
5    user1
6      NaN
7    user2
Name: User, dtype: object