按日期问题分组的分类变量的二进制矢量化编码

Binary Vectorization Encoding for categorical variable grouped by date issue


我在尝试以某种二进制编码对其进行矢量化时遇到问题,但在有多于一行时进行聚合(因为分类变量的变化是非排他性的),但避免将其与其他日期合并。 (python 和 pandas)

假设这是数据

id1 id2 type month.measure
105 50 growing 04-2020
105 50 advancing 04-2020
44 29 advancing 04-2020
105 50 retreating 05-2020
105 50 shrinking 05-2020

就这样结束了

id1 id2 growing shrinking advancing retreating month.measure
105 50 1 0 1 0 04-2020
44 29 0 0 1 0 04-2020
105 50 0 1 0 1 05-2020

我一直在尝试各种转换,lambda 函数,pandas get_dummies 并尝试将它们按 2 个 ID 和日期分组,但我找不到方法.

希望我们能解决!提前致谢! :)

此解决方案使用 pandas get_dummies 对“TYPE”列进行单热编码,然后将单热编码数据帧与原始数据帧连接起来,然后将 groupby 应用于 ID列和“MONTH”:

# Set up the dataframe
ID1 = [105,105,44,105,105]
ID2 = [50,50,29,50,50]
TYPE = ['growing','advancing','advancing','retreating','shrinking']
MONTH = ['04-2020','04-2020','04-2020','05-2020','05-2020']

df = pd.DataFrame({'ID1':ID1,'ID2':ID2, 'TYPE':TYPE, 'MONTH.MEASURE':MONTH})

# Apply get_dummies and groupby operations
df = pd.concat([df.drop('TYPE',axis=1),pd.get_dummies(df['TYPE'])],axis=1)\
       .groupby(['ID1','ID2','MONTH.MEASURE']).sum().reset_index()

# These bits are just cosmetic to get the output to look more like your required output
df.columns = [c.upper() for c in df.columns]

col_order = ['GROWING','SHRINKING','ADVANCING','RETREATING','MONTH.MEASURE']

df[['ID1','ID2']+col_order]

#    ID1  ID2  GROWING  SHRINKING  ADVANCING  RETREATING MONTH.MEASURE
# 0   44   29        0          0          1           0       04-2020
# 1  105   50        1          0          1           0       04-2020
# 2  105   50        0          1          0           1       05-2020

这是crosstab:

pd.crosstab([df['id1'],df['id2'],df['month.measure']], df['type']).reset_index()

输出:

type  id1  id2 month.measure  advancing  growing  retreating  shrinking
0      44   29       04-2020          1        0           0          0
1     105   50       04-2020          1        1           0          0
2     105   50       05-2020          0        0           1          1