如何对 pandas 中数据框的值进行分类?
How to categorize values of data frame in pandas?
我正在尝试使用 python pandas 库编写代码以根据值范围对数据集(来自 CSV)进行分类。可以使用聚合函数。但我正在努力使用聚合函数。
+-------------+-------------+-------------+-------------+-------------+
|Name | Age |Region |Telephone |Address |
+-------------+-------------+-------------+-------------+-------------+
| | | | | |
我可以开发以下代码。
import pandas as pd
data_frame = pd.read_csv('5000 Records.csv')
data_frame['age_range'] = pd.cut(data_frame['Age in Yrs.'],
bins=[-float('inf'),30,50,float('inf')],
labels=['above', 'in between', 'below'])
data_frame = data_frame.groupby(['Region','age_range']).agg(
{
'age_range': "count"
}
)
print(data_frame)
但是结果如下
age_range
Region age_range
Midwest above 312
in between 695
below 390
Northeast above 201
in between 421
below 219
South above 435
in between 983
below 452
West above 211
in between 443
below 238
但要求是得到如下输出:
+-------------+-------------+-------------+-------------+
|Region | above |in between |below |
+-------------+-------------+-------------+-------------+
| | | | |
有人可以帮我做这件事吗?提前致谢!
尝试DataFrame.pivot
方法:
data_frame.pivot(index='Region', columns='age_range', values='count')
使用Series.unstack
with simplify groupby
solution - removed agg
and added GroupBy.size
.
GroupBy.count
用于排除缺失值的计数,这里两种解决方案的工作原理相同,因为 age_range
用于 groupby
:[=24 中的 by
参数=]
df = data_frame.groupby(['Region','age_range']).size().unstack(fill_value=0)
或使用crosstab
:
df = pd.crosstab(data_frame['Region'], data_frame['age_range'])
我正在尝试使用 python pandas 库编写代码以根据值范围对数据集(来自 CSV)进行分类。可以使用聚合函数。但我正在努力使用聚合函数。
+-------------+-------------+-------------+-------------+-------------+
|Name | Age |Region |Telephone |Address |
+-------------+-------------+-------------+-------------+-------------+
| | | | | |
我可以开发以下代码。
import pandas as pd
data_frame = pd.read_csv('5000 Records.csv')
data_frame['age_range'] = pd.cut(data_frame['Age in Yrs.'],
bins=[-float('inf'),30,50,float('inf')],
labels=['above', 'in between', 'below'])
data_frame = data_frame.groupby(['Region','age_range']).agg(
{
'age_range': "count"
}
)
print(data_frame)
但是结果如下
age_range
Region age_range
Midwest above 312
in between 695
below 390
Northeast above 201
in between 421
below 219
South above 435
in between 983
below 452
West above 211
in between 443
below 238
但要求是得到如下输出:
+-------------+-------------+-------------+-------------+
|Region | above |in between |below |
+-------------+-------------+-------------+-------------+
| | | | |
有人可以帮我做这件事吗?提前致谢!
尝试DataFrame.pivot
方法:
data_frame.pivot(index='Region', columns='age_range', values='count')
使用Series.unstack
with simplify groupby
solution - removed agg
and added GroupBy.size
.
GroupBy.count
用于排除缺失值的计数,这里两种解决方案的工作原理相同,因为 age_range
用于 groupby
:[=24 中的 by
参数=]
df = data_frame.groupby(['Region','age_range']).size().unstack(fill_value=0)
或使用crosstab
:
df = pd.crosstab(data_frame['Region'], data_frame['age_range'])