如何在 pandas 中找到共同特征？

Question

我有一个数据集

我想通过查看典型的共同特征来了解我们的客户（例如“40 多岁的已婚客户喜欢红酒”）。这将对应于项目集 {Married, 40s, Wine}。

如何创建一个名为 customer_data_onehot 的新数据框，使行对应于客户（如在原始数据集中），列对应于数据中十个分类属性中每一个的类别。新数据框应仅包含布尔值（True/False 或 0/1s），使得行和列中的值为真（或 1）当且仅当对应于列的属性值适用于对应于排。显示数据框。

我有这个提示“提示：例如，对于属性“教育”，有 5 个可能的类别：'Graduation'、'PhD'、'Master'、'Basic' , '2n Cycle'。因此，新数据框必须为每个属性值包含一列。”但我不明白我该如何实现。

有人可以在这里指导我实现正确的解决方案吗？

我有这段代码可以导入 csv 文件并从原始数据集中选择 90% 的数据。

import pandas as pd
pre_process = pd.read_csv('customer_data.csv')  
pre_process = pre_process.sample(frac=0.9, random_state=413808).to_csv('customer_data_2.csv', 
index=False)

Answer 1

使用get_dummies:

设置一个MRE

data = {'Customer': ['A', 'B', 'C'],
        'Marital_Status': ['Together', 'Married', 'Single'],
        'Age_Group': ['40s', '60s', '20s']}
df = pd.DataFrame(data)
print(df)

# Output
  Customer Marital_Status Age_Group
0        A       Together       40s
1        B        Married       60s
2        C         Single       20s

out = pd.get_dummies(df.set_index('Customer')).reset_index()
print(out)

# Output
  Customer  Marital_Status_Married  Marital_Status_Single  Marital_Status_Together  Age_Group_20s  Age_Group_40s  Age_Group_60s
0        A                       0                      0                        1              0              1              0
1        B                       1                      0                        0              0              0              1
2        C                       0                      1                        0              1              0              0

如何在 pandas 中找到共同特征？

How to find shared characteristics in pandas?

python

boolean

pandas