按列(ID)复制数据框
Pivot dataframe by column (ID) duplicated
我有一个 DataFrame,其中包含一个名为 'ID' 的列,该列具有重复的观察值。每个 'ID' 行都有一个或多个 'Article' 值列。我想通过 'ID' 在唯一 'ID' 的同一行添加新列来转置整个数据帧分组。顺序很重要
我的数据框:
ID Article Article_2
1 Banana NaN
2 Apple NaN
1 Apple Coconut
3 Tomatoe Coconut
1 Pineapple Tropical
2 Banana Coconut
4 Apple Coconut
5 Apple Coconut
3 Apple Pineapple
我的代码(来自@Erfan):
dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2'])
dfn = dfn.pivot_table(index='ID',
columns=dfn.groupby('ID')['value'].cumcount().add(1),
values='value',
aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')
输出:
Article_1 Article_2 Article_3 Article_4 Article_5 Article_6
0001 Banana Apple Pineapple NaN Coconut Tropical
0002 Apple Banana NaN Coconut NaN NaN
0003 Tomatoe Apple Coconut Pineapple NaN NaN
0004 Apple Coconut NaN NaN NaN NaN
0005 Apple Coconut NaN NaN NaN NaN
第一行,Article_4 是 NaN,Article_5 和 6 有值。它应该是 Article_4 椰子,Article_5 热带和 Article_6 NaN。
同样,Article_3 是 NaN,Article_4 是有效值。它应该是 Article_3 有效并且其余 (4,5,6) NaNs
需要输出:
Article_1 Article_2 Article_3 Article_4 Article_5 Article_6
0001 Banana Apple Pineapple Coconut Tropical NaN
0002 Apple Banana Coconut NaN NaN NaN
0003 Tomatoe Apple Coconut Pineapple NaN NaN
0004 Apple Coconut NaN NaN NaN NaN
0005 Apple Coconut NaN NaN NaN NaN
在 melt
之后添加 DataFrame.dropna
以删除 value
列缺少的行:
dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2']).dropna(subset=['value'])
dfn = dfn.pivot_table(index='ID',
columns=dfn.groupby('ID')['value'].cumcount().add(1),
values='value',
aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')
print (dfn)
Article_1 Article_2 Article_3 Article_4 Article_5
1 Banana Apple Pineapple Coconut Tropical
2 Apple Banana Coconut NaN NaN
3 Tomatoe Apple Coconut Pineapple NaN
4 Apple Coconut NaN NaN NaN
5 Apple Coconut NaN NaN NaN
如果需要所有列使用稍微修改的justify
函数:
dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2'])
dfn = dfn.pivot_table(index='ID',
columns=dfn.groupby('ID')['value'].cumcount().add(1),
values='value',
aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')
#
def justify(a, invalid_val=0, axis=1, side='left'):
"""
Justifies a 2D array
Parameters
----------
A : ndarray
Input array to be justified
axis : int
Axis along which justification is to be made
side : str
Direction of justification. It could be 'left', 'right', 'up', 'down'
It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.
"""
if invalid_val is np.nan:
mask = pd.notna(a)
else:
mask = a!=invalid_val
justified_mask = np.sort(mask,axis=axis)
if (side=='up') | (side=='left'):
justified_mask = np.flip(justified_mask,axis=axis)
out = np.full(a.shape, invalid_val, dtype=object)
if axis==1:
out[justified_mask] = a[mask]
else:
out.T[justified_mask.T] = a.T[mask.T]
return out
dfn = pd.DataFrame(justify(dfn.values, invalid_val=np.nan, axis=1, side='left'),
index=dfn.index, columns=dfn.columns)
print (dfn)
Article_1 Article_2 Article_3 Article_4 Article_5 Article_6
1 Banana Apple Pineapple Coconut Tropical NaN
2 Apple Banana Coconut NaN NaN NaN
3 Tomatoe Apple Coconut Pineapple NaN NaN
4 Apple Coconut NaN NaN NaN NaN
5 Apple Coconut NaN NaN NaN NaN
我有一个 DataFrame,其中包含一个名为 'ID' 的列,该列具有重复的观察值。每个 'ID' 行都有一个或多个 'Article' 值列。我想通过 'ID' 在唯一 'ID' 的同一行添加新列来转置整个数据帧分组。顺序很重要
我的数据框:
ID Article Article_2
1 Banana NaN
2 Apple NaN
1 Apple Coconut
3 Tomatoe Coconut
1 Pineapple Tropical
2 Banana Coconut
4 Apple Coconut
5 Apple Coconut
3 Apple Pineapple
我的代码(来自@Erfan):
dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2'])
dfn = dfn.pivot_table(index='ID',
columns=dfn.groupby('ID')['value'].cumcount().add(1),
values='value',
aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')
输出:
Article_1 Article_2 Article_3 Article_4 Article_5 Article_6
0001 Banana Apple Pineapple NaN Coconut Tropical
0002 Apple Banana NaN Coconut NaN NaN
0003 Tomatoe Apple Coconut Pineapple NaN NaN
0004 Apple Coconut NaN NaN NaN NaN
0005 Apple Coconut NaN NaN NaN NaN
第一行,Article_4 是 NaN,Article_5 和 6 有值。它应该是 Article_4 椰子,Article_5 热带和 Article_6 NaN。 同样,Article_3 是 NaN,Article_4 是有效值。它应该是 Article_3 有效并且其余 (4,5,6) NaNs
需要输出:
Article_1 Article_2 Article_3 Article_4 Article_5 Article_6
0001 Banana Apple Pineapple Coconut Tropical NaN
0002 Apple Banana Coconut NaN NaN NaN
0003 Tomatoe Apple Coconut Pineapple NaN NaN
0004 Apple Coconut NaN NaN NaN NaN
0005 Apple Coconut NaN NaN NaN NaN
在 melt
之后添加 DataFrame.dropna
以删除 value
列缺少的行:
dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2']).dropna(subset=['value'])
dfn = dfn.pivot_table(index='ID',
columns=dfn.groupby('ID')['value'].cumcount().add(1),
values='value',
aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')
print (dfn)
Article_1 Article_2 Article_3 Article_4 Article_5
1 Banana Apple Pineapple Coconut Tropical
2 Apple Banana Coconut NaN NaN
3 Tomatoe Apple Coconut Pineapple NaN
4 Apple Coconut NaN NaN NaN
5 Apple Coconut NaN NaN NaN
如果需要所有列使用稍微修改的justify
函数:
dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2'])
dfn = dfn.pivot_table(index='ID',
columns=dfn.groupby('ID')['value'].cumcount().add(1),
values='value',
aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')
#
def justify(a, invalid_val=0, axis=1, side='left'):
"""
Justifies a 2D array
Parameters
----------
A : ndarray
Input array to be justified
axis : int
Axis along which justification is to be made
side : str
Direction of justification. It could be 'left', 'right', 'up', 'down'
It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.
"""
if invalid_val is np.nan:
mask = pd.notna(a)
else:
mask = a!=invalid_val
justified_mask = np.sort(mask,axis=axis)
if (side=='up') | (side=='left'):
justified_mask = np.flip(justified_mask,axis=axis)
out = np.full(a.shape, invalid_val, dtype=object)
if axis==1:
out[justified_mask] = a[mask]
else:
out.T[justified_mask.T] = a.T[mask.T]
return out
dfn = pd.DataFrame(justify(dfn.values, invalid_val=np.nan, axis=1, side='left'),
index=dfn.index, columns=dfn.columns)
print (dfn)
Article_1 Article_2 Article_3 Article_4 Article_5 Article_6
1 Banana Apple Pineapple Coconut Tropical NaN
2 Apple Banana Coconut NaN NaN NaN
3 Tomatoe Apple Coconut Pineapple NaN NaN
4 Apple Coconut NaN NaN NaN NaN
5 Apple Coconut NaN NaN NaN NaN