将比率字段插入 Pandas 系列

Question

我得到一个 Pandas 系列：

 countrypat = asiaselect.groupby('Country')['Pattern'].value_counts().groupby(level=0).head(3)

输出如下所示：

China      abc                1055
           def                 778
           ghi                 612
Malaysia   def                 554
           abc                 441
           ghi                 178
[...]

如何插入一个新列（我必须将其设为数据框），其中包含数字列与该国家/地区数字总和的比率。因此，对于中国，我想要一个新列，第一行将包含 (1055/(1055+778+612))。我尝试了 unstack() 和 to_df() 但不确定接下来的步骤。

Answer 1

我在我这边创建了一个数据框，但排除了你的分配的.head(3)：

countrypat = asiaselect.groupby('Country')['Pattern'].value_counts().groupby(level=0)

以下内容将简单应用于您的 groupby 对象，为您提供比例：

countrypat.apply(lambda x: x / float(x.sum()))

唯一的'problem'是这样做returns你是一个系列，所以我会将中间结果存储在两个不同的系列中并在最后合并它们：

series1 = asiaselect.groupby('Country')['Pattern'].value_counts()
series2 = asiaselect.groupby('Country')['Pattern'].value_counts().groupby(level=0).apply(lambda x: x / float(x.sum()))
pd.DataFrame([series1, series2]).T

China    abc       1055.0  0.431493
         def        778.0  0.318200
         ghi        612.0  0.250307
Malaysia def        554.0  0.472293
         abc        441.0  0.375959
         ghi        178.0  0.151748

为了获得前三行，您可以简单地向每个 series1 和 series2 添加一个 .groupby(level=0).head(3)

series1_top = series1.groupby(level=0).head(3)
series2_top = series2.groupby(level=0).head(3)
pd.DataFrame([series1_top, series2_top]).T

我用包含 3 行以上的数据框进行了测试，它似乎可以工作。从以下 df 开始：

China     abc        1055
          def         778
          ghi         612
          yyy           5
          xxx           3
          zzz           3
Malaysia  def         554
          abc         441
          ghi         178
          yyy           5
          xxx           3
          zzz           3

然后这样结束：

China    abc       1055.0  0.429560
         def        778.0  0.316775
         ghi        612.0  0.249186
Malaysia def        554.0  0.467905
         abc        441.0  0.372466
         ghi        178.0  0.150338

将比率字段插入 Pandas 系列

Inserting a Ratio field into a Pandas Series

python

series

pandas