如何对 pandas 中数据框的值进行分类?

How to categorize values of data frame in pandas?

我正在尝试使用 python pandas 库编写代码以根据值范围对数据集(来自 CSV)进行分类。可以使用聚合函数。但我正在努力使用聚合函数。

    +-------------+-------------+-------------+-------------+-------------+
    |Name         | Age         |Region       |Telephone    |Address      |
    +-------------+-------------+-------------+-------------+-------------+
    |             |             |             |             |             |

我可以开发以下代码。

import pandas as pd

data_frame = pd.read_csv('5000 Records.csv')

data_frame['age_range'] = pd.cut(data_frame['Age in Yrs.'],
                             bins=[-float('inf'),30,50,float('inf')],
                             labels=['above', 'in between', 'below'])

data_frame = data_frame.groupby(['Region','age_range']).agg(
    {
        'age_range': "count"
    }
)

print(data_frame)

但是结果如下

                      age_range
Region    age_range            
Midwest   above             312
          in between        695
          below             390
Northeast above             201
          in between        421
          below             219
South     above             435
          in between        983
          below             452
West      above             211
          in between        443
          below             238

但要求是得到如下输出:

+-------------+-------------+-------------+-------------+
|Region       | above         |in between |below        |
+-------------+-------------+-------------+-------------+
|             |             |             |             | 

有人可以帮我做这件事吗?提前致谢!

尝试DataFrame.pivot方法:

data_frame.pivot(index='Region', columns='age_range', values='count')

使用Series.unstack with simplify groupby solution - removed agg and added GroupBy.size.

GroupBy.count 用于排除缺失值的计数,这里两种解决方案的工作原理相同,因为 age_range 用于 groupby:[=24 中的 by 参数=]

df = data_frame.groupby(['Region','age_range']).size().unstack(fill_value=0)

或使用crosstab:

df = pd.crosstab(data_frame['Region'], data_frame['age_range'])