Koalas 数据框 pivot_table,包含行和列总计
Koalas data frame pivot_table with row and column totals
我之前在 Pandas 数据帧上使用以下代码成功生成了一个包含列和行总计的数据透视表 table,如 Excel:
df_check=df.pivot_table(index=['month_yr'], columns=['product'], margins=True, aggfunc=np.sum, values='weight', margins_name='Total')
我现在正在使用与 Databricks 中的 Koalas 数据帧 相同的数据。但是这段代码不会 运行;我不能使用 'margins=True'
或 'margins_name='Total'
,因为它们会产生错误。
我将上面的代码修改为 运行 但没有 column/row 总数:
kdf_check=kdf.pivot_table(index=['month_yr'], columns='product', aggfunc='sum',values='weight')
如何修改此代码以添加行和列总计?
来自 kdf.head()。to_dict():
{'city_nm': {0: 'HAMILTON',
1: 'MADAWASKA',
2: 'PALMER RAPIDS',
3: 'OXFORD MILLS',
4: 'MADAWASKA'},
'prov': {0: 'ON', 1: 'ON', 2: 'ON', 3: 'ON', 4: 'ON'},
'dest_country': {0: 'United States',
1: 'United States',
2: 'United States',
3: 'United States',
4: 'United States'},
'commodity_descrp': {0: 'BIOLOGICAL SUBSTANCE',
1: 'VME CARD CAGE/HEAD. SENSOR SENT',
2: 'DENTAL IMPRESSION TRAY',
3: 'LAPTOP',
4: 'VME CARD CAGE/HEAD. SENSOR SENT'},
'harmonized_code': {0: '3002.90.5010', 1: '', 2: '', 3: '', 4: ''},
'acct_nbr': {0: '', 1: '', 2: '', 3: '', 4: ''},
'bus_nm': {0: None, 1: None, 2: None, 3: None, 4: None},
'naic_4_cd': {0: '', 1: '', 2: '', 3: '', 4: ''},
'product': {0: 'FAST', 1: 'FAST', 2: 'SLOW', 3: 'FAST', 4: 'FAST'},
'weight': {0: 11.0, 1: 85.0, 2: 0.7, 3: 7.0, 4: 85.0},
'dest_indicator': {0: 'US', 1: 'US', 2: 'US', 3: 'US', 4: 'US'},
'mth_yr': {0: 'July2020',
1: 'May2020',
2: 'June2020',
3: 'December2020',
4: 'May2020'},
'hc_dig': {0: '30', 1: '', 2: '', 3: '', 4: ''},
'hc_dig_clean': {0: '30', 1: '', 2: '30', 3: '', 4: ''}}
由于 koalas.pivot_table()
不支持 margins
,我将手动添加总数:
kdf_check = kdf_check.append(kdf_check.sum().rename('Total'))
kdf_check['Total'] = kdf_check.sum(1)
# product FAST SLOW Total
# mth_yr
# December2020 7.0 NaN 7.0
# July2020 11.0 NaN 11.0
# June2020 NaN 0.7 0.7
# May2020 170.0 NaN 170.0
# Total 188.0 0.7 188.7
我之前在 Pandas 数据帧上使用以下代码成功生成了一个包含列和行总计的数据透视表 table,如 Excel:
df_check=df.pivot_table(index=['month_yr'], columns=['product'], margins=True, aggfunc=np.sum, values='weight', margins_name='Total')
我现在正在使用与 Databricks 中的 Koalas 数据帧 相同的数据。但是这段代码不会 运行;我不能使用 'margins=True'
或 'margins_name='Total'
,因为它们会产生错误。
我将上面的代码修改为 运行 但没有 column/row 总数:
kdf_check=kdf.pivot_table(index=['month_yr'], columns='product', aggfunc='sum',values='weight')
如何修改此代码以添加行和列总计?
来自 kdf.head()。to_dict():
{'city_nm': {0: 'HAMILTON',
1: 'MADAWASKA',
2: 'PALMER RAPIDS',
3: 'OXFORD MILLS',
4: 'MADAWASKA'},
'prov': {0: 'ON', 1: 'ON', 2: 'ON', 3: 'ON', 4: 'ON'},
'dest_country': {0: 'United States',
1: 'United States',
2: 'United States',
3: 'United States',
4: 'United States'},
'commodity_descrp': {0: 'BIOLOGICAL SUBSTANCE',
1: 'VME CARD CAGE/HEAD. SENSOR SENT',
2: 'DENTAL IMPRESSION TRAY',
3: 'LAPTOP',
4: 'VME CARD CAGE/HEAD. SENSOR SENT'},
'harmonized_code': {0: '3002.90.5010', 1: '', 2: '', 3: '', 4: ''},
'acct_nbr': {0: '', 1: '', 2: '', 3: '', 4: ''},
'bus_nm': {0: None, 1: None, 2: None, 3: None, 4: None},
'naic_4_cd': {0: '', 1: '', 2: '', 3: '', 4: ''},
'product': {0: 'FAST', 1: 'FAST', 2: 'SLOW', 3: 'FAST', 4: 'FAST'},
'weight': {0: 11.0, 1: 85.0, 2: 0.7, 3: 7.0, 4: 85.0},
'dest_indicator': {0: 'US', 1: 'US', 2: 'US', 3: 'US', 4: 'US'},
'mth_yr': {0: 'July2020',
1: 'May2020',
2: 'June2020',
3: 'December2020',
4: 'May2020'},
'hc_dig': {0: '30', 1: '', 2: '', 3: '', 4: ''},
'hc_dig_clean': {0: '30', 1: '', 2: '30', 3: '', 4: ''}}
由于 koalas.pivot_table()
不支持 margins
,我将手动添加总数:
kdf_check = kdf_check.append(kdf_check.sum().rename('Total'))
kdf_check['Total'] = kdf_check.sum(1)
# product FAST SLOW Total
# mth_yr
# December2020 7.0 NaN 7.0
# July2020 11.0 NaN 11.0
# June2020 NaN 0.7 0.7
# May2020 170.0 NaN 170.0
# Total 188.0 0.7 188.7