使用 pd.pivot_table 中的现有列初始化 pd.pivot_table 中的新列

Question

我使用 pd.read_csv 将 table 导入到 python 中，如下所示

我需要在此 table 上执行 3 个活动，如下所示

使用分组依据计算每个类型的免费应用和付费应用的数量。

我编写了以下代码以获得所需的输出`

df.groupby(['prime_genre','Subscription'])[['id']].count()`

输出：

将 1 的结果转换成一个数据帧，其中列为 prime_genre，免费，付费，行数为

我写了下面的代码来得到想要的输出

df1 = df.groupby(['prime_genre','Subscription'])['id'].count().reset_index() df1.pivot_table(index='prime_genre', columns='Subscription', values='id', aggfunc='sum')

输出：

现在我需要初始化一个列 'Total'，它捕获枢轴 table 本身 [=15] 中 'free app' 和 'paid app' 的总和=]
我还需要初始化另外两列 perc_free 和 perc_paid，它们显示枢轴 table 本身中免费应用和付费应用的百分比

我该如何处理 3 和 4？

Answer 1

假设以下枢轴 table 命名为 df2:

subscription  free app  paid app
prime_genre                     
book                66        46
business            20        37
catalogs             9         1
education          132       321

您可以使用 pandas.DataFrame.sum on the columns (axis=1). Then divide df2 with this total and multiply by 100 to get the percentage. You can add a suffix to the columns with pandas.DataFrame.add_suffix. Finally, combine everything with pandas.concat 计算总数：

total = df2.sum(axis=1)
percent = df2.div(total, axis=0).mul(100).add_suffix(' percent')
df2['Total'] = total
pd.concat([df2, percent], axis=1)

输出：

subscription  free app  paid app  Total  free app percent  paid app percent
prime_genre                                                                
book                66        46    112         58.928571         41.071429
business            20        37     57         35.087719         64.912281
catalogs             9         1     10         90.000000         10.000000
education          132       321    453         29.139073         70.860927

这是获取 perc_free / perc_paid 名称的变体：

total = df2.sum(axis=1)
percent = (df2.div(total, axis=0)
              .mul(100)
              .rename(columns=lambda x: re.sub('(.*)( app)', r'perc_',x))
          )
df2['Total'] = total
pd.concat([df2, percent], axis=1)

subscription  free app  paid app  Total  perc_free  perc_paid
prime_genre                                                  
book                66        46    112  58.928571  41.071429
business            20        37     57  35.087719  64.912281
catalogs             9         1     10  90.000000  10.000000
education          132       321    453  29.139073  70.860927

使用 pd.pivot_table 中的现有列初始化 pd.pivot_table 中的新列

Initializing new columns in pd.pivot_table using existing columns within pd.pivot_table

pivot-table

dataframe

python-3.x

pandas