根据另一列 pandas python 的值在 python 中添加新列

Question

我正在尝试对这个数据集做一些简单的操作。

我正在尝试：

计算归因于每个聚类的总计数。例如，对于集群 0，我必须求和 7+4+61+7+12= 91
添加一个新列 'total of counts'，其中总计数与相应的集群配对（即 'clusters' 列中值为“0”的行的值为'total of counts' 列中的 91
将 'counts' 列除以 'total of counts'，然后乘以 100（计算计数百分比）。结果应添加到新列中。

有人可以帮我写一个代码吗？

Answer 1

您可以使用这行代码将为您提供名为 total 的新列以及此列将是第 0 列到第 11 列的平均值在这里你可以用你需要的任何其他操作替换平均值

 df['total'] = df.iloc[:,:12].mean()

Answer 2

要计算归因于每个聚类的总计数，请使用此代码：

总计 = df.groupby('clusters')['count'].sum().rename('total of counts')
要添加一个新列 'total of counts'，其中总计数与相应的簇成对出现，请使用此代码：

df = df.join(总计, on='clusters', lsuffix='')
要将列 'counts' 除以 'total of counts' 并乘以 100，请使用此代码：

df['counts by total of counts'] = df['count']/df['total of counts']*100

Answer 3

假设您已调用数据框 df，您可以执行以下操作：

点 1 在 clusters 列上使用 groupby() 方法并使用 sum() 聚合方法计算总和，例如：

df_grouped = df.groupby('clusters').sum()

完成后，您可能希望将该数据框中的列重命名为更有用的名称，例如：

df_grouped = df_grouped.rename(columns={'count': 'cluster_count'})

点 2 要将总计返回到您的数据框中，您可以将 grouped_df 与原始数据框合并，例如：

df_merged = pd.merge(left=df, 
                     right=df_grouped, 
                     left_on='clusters', 
                     right_index=True)

使用 'clusters' 列的位置是左侧数据框的键，并使用 df_grouped 数据框的索引（簇值将在 [=15= 之后的索引中） ]第1点的操作）。

点 3 最后一步现在很简单。只需使用您的最终数据框并添加一个包含所需计算结果的新列：

df_merged['count_pct_cluster'] = df_merged['count'] / df_merged['cluster_count'] * 100

根据另一列 pandas python 的值在 python 中添加新列

Adding a new column in python based on the value of another column pandas python

python

count

percentage

pandas