将值插入结果为 pandas 的单元格
Insert Value to a Cell with result of group by pandas
我有一个巨大的 excel 文件,就像这样:
Table 1
我的愿望 Table 是这样的:
my dsire Table
我使用分组依据、计数和求和,例如:
import pandas as pd
import openpyxl as op
from openpyxl import load_workbook
from openpyxl import Workbook
import numpy as np
path1 = r"users.xlsx"
data = pd.read_excel(path1, engine='openpyxl')
df = pd.DataFrame(data)
NumberOfChild = df.groupby('Parent ID')['Parent ID'].count().to_frame('Employees Number')
NumberOfBooking = df.groupby('Parent ID')['Reservations Count'].transform('sum')
这给了我正确的 Booking 和 Child 数量,但我不能在列 numberOfChild 和 numberOfBooking
中找到这些值
假设您有以下数据框
>>> df
id parent_id reservations
0 1 NaN 1
1 2 1.0 3
2 3 1.0 5
3 4 NaN 2
4 5 4.0 6
5 6 NaN 7
首先计算children
的个数
>>> children = df.groupby("parent_id").id.count().rename("children")
>>> children
parent_id
1.0 2
4.0 1
Name: children, dtype: int64
然后创建一个聚合新列,如果该行没有 parent_id,则该列为 id,否则为 parent_id
>>> df["book_key"] = df.parent_id.fillna(df.id).astype(int)
>>> df
id parent_id reservations book_key
0 1 NaN 1 1
1 2 1.0 3 1
2 3 1.0 5 1
3 4 NaN 2 4
4 5 4.0 6 4
5 6 NaN 7 6
使用这个新键计算预订总数
>>> reservations = df.groupby("book_key").reservations.sum().rename("total")
>>> reservations
book_key
1 9
4 8
6 7
Name: total, dtype: int64
最后加入数据框,删除 book_key 列并可选地用 ""
替换 NaN
>>> df = df.set_index("id").join(children).join(reservations).drop(columns="book_key").fillna("")
>>> df
parent_id reservations children total
id
1 1 2.0 9.0
2 1.0 3
3 1.0 5
4 2 1.0 8.0
5 4.0 6
6 7 7.0
我有一个巨大的 excel 文件,就像这样:
Table 1
我的愿望 Table 是这样的:
my dsire Table
我使用分组依据、计数和求和,例如:
import pandas as pd
import openpyxl as op
from openpyxl import load_workbook
from openpyxl import Workbook
import numpy as np
path1 = r"users.xlsx"
data = pd.read_excel(path1, engine='openpyxl')
df = pd.DataFrame(data)
NumberOfChild = df.groupby('Parent ID')['Parent ID'].count().to_frame('Employees Number')
NumberOfBooking = df.groupby('Parent ID')['Reservations Count'].transform('sum')
这给了我正确的 Booking 和 Child 数量,但我不能在列 numberOfChild 和 numberOfBooking
中找到这些值假设您有以下数据框
>>> df
id parent_id reservations
0 1 NaN 1
1 2 1.0 3
2 3 1.0 5
3 4 NaN 2
4 5 4.0 6
5 6 NaN 7
首先计算children
的个数>>> children = df.groupby("parent_id").id.count().rename("children")
>>> children
parent_id
1.0 2
4.0 1
Name: children, dtype: int64
然后创建一个聚合新列,如果该行没有 parent_id,则该列为 id,否则为 parent_id
>>> df["book_key"] = df.parent_id.fillna(df.id).astype(int)
>>> df
id parent_id reservations book_key
0 1 NaN 1 1
1 2 1.0 3 1
2 3 1.0 5 1
3 4 NaN 2 4
4 5 4.0 6 4
5 6 NaN 7 6
使用这个新键计算预订总数
>>> reservations = df.groupby("book_key").reservations.sum().rename("total")
>>> reservations
book_key
1 9
4 8
6 7
Name: total, dtype: int64
最后加入数据框,删除 book_key 列并可选地用 ""
替换 NaN>>> df = df.set_index("id").join(children).join(reservations).drop(columns="book_key").fillna("")
>>> df
parent_id reservations children total
id
1 1 2.0 9.0
2 1.0 3
3 1.0 5
4 2 1.0 8.0
5 4.0 6
6 7 7.0