多类别one-hot编码到pivot-table

Question

我有一些数据主要由一个热编码的分类数据组成。我希望能够证明类别的共同出现，但我不太清楚如何重塑它或计算它。我的主要问题是，虽然我有独特的案例，但类别是非排他性的，这意味着每个案例都可以分类，在这个例子中，有多个国家，有多个问题。

case_id country.france  country.germany issue.water issue.health    issue.poverty
    0           0           1              1            0               1
    1           1           1              0            1               1
    2           1           1              0            1               1
    3           1           0              1            1               1

期望的输出 - 一个枢轴 table 显示国家类别和问题类别之间的共现计数：

                    issue.water issue.health    issue.poverty
country.france          1             3               3
country.germany         1             2               3

我尝试重塑我的数据，使我的数据看起来更像...

case_id   country   issue
0        germany    water
0        germany    poverty
1        france     health
1        france     poverty
2        france     health
2        france     poverty
2        germany    health
2        germany    poverty
3        france     water
3        france     health
3        france     poverty

但我不清楚如何将其转换为所需的输出，或者这是否是处理具有多个分类的案例的正确方法。我有代码，但到目前为止都是关于重塑的，我不确定我是否应该以重塑为目标，然后才能知道我是否采用正确的方法来处理每个案例的多个类别。

如果能帮助我解决这个问题，我将不胜感激！

Answer 1

在你重塑的数据框上（如果df），也许你可以尝试如下：

p = pd.pivot_table(df, index='country', columns='issue', aggfunc="count")
# setting column names
p.columns = [c.replace('case_id', 'issue.')for c in map("".join, p.columns)]

Answer 2

重塑 df

后

pd.crosstab(df.country,df.issue)
Out[306]: 
issue    health  poverty  water
country                        
france        3        3      1
germany       1        2      1

或更多相关人员使用 wide_to_long

从您的 df1 获取结果

newdf=pd.wide_to_long(df1,['issue'],i='case_id',j='issueid',suffix='\w+',sep='.').set_index('issue',append=True).sum(level=[1,2]).query('issue==1')
newdf.reset_index(level=1,drop=True).T
Out[326]: 
issueid          water  health  poverty
country.france       1       3        3
country.germany      1       2        3

多类别one-hot编码到pivot-table

Mutli-category one-hot encoding to pivot-table

python

categories

pandas

categorical-data