将熊猫数据框的行转换为列
Converting the rows of panda data frame into columns
有两个像这样的熊猫数据框:
Key Value
A 2
A 6
B 7
A 1
B 3
B 4
A 2
我怎样才能重塑成这样:
A B
2 7
6 3
1 4
2 NaN
您可以使用 groupby
和 apply
来创建新的 index
值:
df = df.groupby('Key').Value.apply(lambda x: pd.Series(x.values)).unstack(0)
print (df)
Key A B
0 2 7
1 6 3
2 1 4
3 2 0
pivot
and creating new index
values by cumcount
的另一个解决方案:
df = pd.pivot(index = df.groupby('Key').cumcount(), columns=df['Key'], values=df['Value'])
print (df)
Key A B
0 2 7
1 6 3
2 1 4
3 2 0
df1 = df.groupby('Key').Value.apply(lambda x: pd.Series(x.values)).unstack(0)
print (df1)
Key A B
0 2.0 7.0
1 6.0 3.0
2 1.0 4.0
3 2.0 NaN
df2 = pd.pivot(index = df.groupby('Key').cumcount(), columns=df['Key'], values=df['Value'])
print (df2)
Key A B
0 2.0 7.0
1 6.0 3.0
2 1.0 4.0
3 2.0 NaN
pandas
使用 pd.concat
和列表理解和 np.unique
s = pd.Series(df.Value.values, df.Key.values)
u = np.unique(s.index.values).tolist()
pd.concat([s.loc[k].reset_index(drop=True) for k in u], axis=1, keys=u)
A B
0 2 7.0
1 6 3.0
2 1 4.0
3 2 NaN
numpy
# np.unique can return value counts and an inverse array
# the inverse array will be very helpful in slicing the final
# array we are trying to fill
u, inv, c = np.unique(df.Key.values, return_inverse=True, return_counts=True)
# construct empty array to fill with values
# number of rows equal to the maximum value count
# number of columns equal to the number of unique values
new = np.empty((c.max(), len(u)), dtype=np.float)
new.fill(np.nan)
# construct handy cumulative count per unique value
rows = np.arange(len(inv)) - np.append(0, c[:-1]).repeat(c)
# use slicing arrays to fill empty array
new[rows, inv] = df.Value.values
pd.DataFrame(new, np.arange(c.max()), u)
A B
0 2 7.0
1 6 3.0
2 1 4.0
3 2 NaN
时间测试
有两个像这样的熊猫数据框:
Key Value
A 2
A 6
B 7
A 1
B 3
B 4
A 2
我怎样才能重塑成这样:
A B
2 7
6 3
1 4
2 NaN
您可以使用 groupby
和 apply
来创建新的 index
值:
df = df.groupby('Key').Value.apply(lambda x: pd.Series(x.values)).unstack(0)
print (df)
Key A B
0 2 7
1 6 3
2 1 4
3 2 0
pivot
and creating new index
values by cumcount
的另一个解决方案:
df = pd.pivot(index = df.groupby('Key').cumcount(), columns=df['Key'], values=df['Value'])
print (df)
Key A B
0 2 7
1 6 3
2 1 4
3 2 0
df1 = df.groupby('Key').Value.apply(lambda x: pd.Series(x.values)).unstack(0)
print (df1)
Key A B
0 2.0 7.0
1 6.0 3.0
2 1.0 4.0
3 2.0 NaN
df2 = pd.pivot(index = df.groupby('Key').cumcount(), columns=df['Key'], values=df['Value'])
print (df2)
Key A B
0 2.0 7.0
1 6.0 3.0
2 1.0 4.0
3 2.0 NaN
pandas
使用 pd.concat
和列表理解和 np.unique
s = pd.Series(df.Value.values, df.Key.values)
u = np.unique(s.index.values).tolist()
pd.concat([s.loc[k].reset_index(drop=True) for k in u], axis=1, keys=u)
A B
0 2 7.0
1 6 3.0
2 1 4.0
3 2 NaN
numpy
# np.unique can return value counts and an inverse array
# the inverse array will be very helpful in slicing the final
# array we are trying to fill
u, inv, c = np.unique(df.Key.values, return_inverse=True, return_counts=True)
# construct empty array to fill with values
# number of rows equal to the maximum value count
# number of columns equal to the number of unique values
new = np.empty((c.max(), len(u)), dtype=np.float)
new.fill(np.nan)
# construct handy cumulative count per unique value
rows = np.arange(len(inv)) - np.append(0, c[:-1]).repeat(c)
# use slicing arrays to fill empty array
new[rows, inv] = df.Value.values
pd.DataFrame(new, np.arange(c.max()), u)
A B
0 2 7.0
1 6 3.0
2 1 4.0
3 2 NaN
时间测试