按列(ID)复制数据框

Pivot dataframe by column (ID) duplicated

我有一个 DataFrame,其中包含一个名为 'ID' 的列,该列具有重复的观察值。每个 'ID' 行都有一个或多个 'Article' 值列。我想通过 'ID' 在唯一 'ID' 的同一行添加新列来转置整个数据帧分组。顺序很重要

我的数据框:

ID  Article     Article_2
1   Banana      NaN
2   Apple       NaN
1   Apple       Coconut
3   Tomatoe     Coconut
1   Pineapple   Tropical
2   Banana      Coconut
4   Apple       Coconut
5   Apple       Coconut
3   Apple       Pineapple

我的代码(来自@Erfan):

dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2'])
dfn = dfn.pivot_table(index='ID', 
                      columns=dfn.groupby('ID')['value'].cumcount().add(1),
                      values='value',
                      aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')

输出:

        Article_1   Article_2   Article_3   Article_4   Article_5   Article_6
0001    Banana      Apple       Pineapple   NaN         Coconut     Tropical
0002    Apple       Banana      NaN         Coconut     NaN         NaN
0003    Tomatoe     Apple       Coconut     Pineapple   NaN         NaN
0004    Apple       Coconut     NaN         NaN         NaN         NaN
0005    Apple       Coconut     NaN         NaN         NaN         NaN

第一行,Article_4 是 NaN,Article_5 和 6 有值。它应该是 Article_4 椰子,Article_5 热带和 Article_6 NaN。 同样,Article_3 是 NaN,Article_4 是有效值。它应该是 Article_3 有效并且其余 (4,5,6) NaNs

需要输出:

        Article_1   Article_2   Article_3   Article_4   Article_5   Article_6
0001    Banana      Apple       Pineapple   Coconut     Tropical    NaN     
0002    Apple       Banana      Coconut     NaN         NaN         NaN
0003    Tomatoe     Apple       Coconut     Pineapple   NaN         NaN
0004    Apple       Coconut     NaN         NaN         NaN         NaN
0005    Apple       Coconut     NaN         NaN         NaN         NaN

melt 之后添加 DataFrame.dropna 以删除 value 列缺少的行:

dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2']).dropna(subset=['value'])

dfn = dfn.pivot_table(index='ID', 
                      columns=dfn.groupby('ID')['value'].cumcount().add(1),
                      values='value',
                      aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')

print (dfn)
  Article_1 Article_2  Article_3  Article_4 Article_5
1    Banana     Apple  Pineapple    Coconut  Tropical
2     Apple    Banana    Coconut        NaN       NaN
3   Tomatoe     Apple    Coconut  Pineapple       NaN
4     Apple   Coconut        NaN        NaN       NaN
5     Apple   Coconut        NaN        NaN       NaN

如果需要所有列使用稍微修改的justify函数:

dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2'])

dfn = dfn.pivot_table(index='ID', 
                      columns=dfn.groupby('ID')['value'].cumcount().add(1),
                      values='value',
                      aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')

#
def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = pd.notna(a)
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val, dtype=object) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

dfn = pd.DataFrame(justify(dfn.values, invalid_val=np.nan, axis=1, side='left'),
                   index=dfn.index, columns=dfn.columns)
print (dfn)
  Article_1 Article_2  Article_3  Article_4 Article_5 Article_6
1    Banana     Apple  Pineapple    Coconut  Tropical       NaN
2     Apple    Banana    Coconut        NaN       NaN       NaN
3   Tomatoe     Apple    Coconut  Pineapple       NaN       NaN
4     Apple   Coconut        NaN        NaN       NaN       NaN
5     Apple   Coconut        NaN        NaN       NaN       NaN