计算 DataFrame 中的出现次数
Count occurrences in DataFrame
我有一个这种格式的数据框:
| Department | Person | Power | ... |
|------------|--------|--------|-----|
| ABC | 1234 | 75 | ... |
| ABC | 1235 | 25 | ... |
| DEF | 1236 | 50 | ... |
| DEF | 1237 | 100 | ... |
| DEF | 1238 | 25 | ... |
| DEF | 1239 | 50 | ... |
我现在想要得到的是幂列中每个值出现次数的总和。我怎样才能从我的 DataFrame 中得到这个?
| Department | 100 | 75 | 50 | 25 |
|------------|-----|-----|-----|-----|
| ABC | 0 | 1 | 0 | 1 |
| DEF | 1 | 0 | 2 | 1 |
您可以使用 value_counts
with sort_index
, then generate DataFrame
by to_frame
and last transpose by T
:
print (df.Power.value_counts().sort_index(ascending=False).to_frame().T)
100 75 50 25
Power 1 1 2 2
通过评论编辑:
你需要crosstab
:
print (pd.crosstab(df.Department, df.Power).sort_index(axis=1, ascending=False))
Power 100 75 50 25
Department
ABC 0 1 0 1
DEF 1 0 2 1
更快的另一个解决方案 groupby
and unstack
:
print (df.groupby(['Department','Power'])
.size()
.unstack(fill_value=0)
.sort_index(axis=1, ascending=False))
Power 100 75 50 25
Department
ABC 0 1 0 1
DEF 1 0 2 1
如果需要 groupby
按列 Department
和 Person
,将列 Person
添加到 groupby
到第二个位置(谢谢 ):
print (df.groupby(['Department','Person', 'Power'])
.size()
.unstack(fill_value=0)
.sort_index(axis=1, ascending=False))
Power 100 75 50 25
Department Person
ABC 1234 0 1 0 0
1235 0 0 0 1
DEF 1236 0 0 1 0
1237 1 0 0 0
1238 0 0 0 1
1239 0 0 1 0
EDIT1 通过评论:
如果需要添加其他缺失值,使用reindex
:
print (df.groupby(['Department','Power'])
.size()
.unstack(fill_value=0)
.reindex(columns=[100,75,50,25,0], fill_value=0))
Power 100 75 50 25 0
Department
ABC 0 1 0 1 0
DEF 1 0 2 1 0
或者可以这样做:
>>> df.groupby(['Department','Power']).count().unstack().fillna(0)
Person
Power 25 50 75 100
Department
ABC 1.0 0.0 1.0 0.0
DEF 1.0 2.0 0.0 1.0
我有一个这种格式的数据框:
| Department | Person | Power | ... |
|------------|--------|--------|-----|
| ABC | 1234 | 75 | ... |
| ABC | 1235 | 25 | ... |
| DEF | 1236 | 50 | ... |
| DEF | 1237 | 100 | ... |
| DEF | 1238 | 25 | ... |
| DEF | 1239 | 50 | ... |
我现在想要得到的是幂列中每个值出现次数的总和。我怎样才能从我的 DataFrame 中得到这个?
| Department | 100 | 75 | 50 | 25 |
|------------|-----|-----|-----|-----|
| ABC | 0 | 1 | 0 | 1 |
| DEF | 1 | 0 | 2 | 1 |
您可以使用 value_counts
with sort_index
, then generate DataFrame
by to_frame
and last transpose by T
:
print (df.Power.value_counts().sort_index(ascending=False).to_frame().T)
100 75 50 25
Power 1 1 2 2
通过评论编辑:
你需要crosstab
:
print (pd.crosstab(df.Department, df.Power).sort_index(axis=1, ascending=False))
Power 100 75 50 25
Department
ABC 0 1 0 1
DEF 1 0 2 1
更快的另一个解决方案 groupby
and unstack
:
print (df.groupby(['Department','Power'])
.size()
.unstack(fill_value=0)
.sort_index(axis=1, ascending=False))
Power 100 75 50 25
Department
ABC 0 1 0 1
DEF 1 0 2 1
如果需要 groupby
按列 Department
和 Person
,将列 Person
添加到 groupby
到第二个位置(谢谢
print (df.groupby(['Department','Person', 'Power'])
.size()
.unstack(fill_value=0)
.sort_index(axis=1, ascending=False))
Power 100 75 50 25
Department Person
ABC 1234 0 1 0 0
1235 0 0 0 1
DEF 1236 0 0 1 0
1237 1 0 0 0
1238 0 0 0 1
1239 0 0 1 0
EDIT1 通过评论:
如果需要添加其他缺失值,使用reindex
:
print (df.groupby(['Department','Power'])
.size()
.unstack(fill_value=0)
.reindex(columns=[100,75,50,25,0], fill_value=0))
Power 100 75 50 25 0
Department
ABC 0 1 0 1 0
DEF 1 0 2 1 0
或者可以这样做:
>>> df.groupby(['Department','Power']).count().unstack().fillna(0)
Person
Power 25 50 75 100
Department
ABC 1.0 0.0 1.0 0.0
DEF 1.0 2.0 0.0 1.0