Python Pandas 两列的枢轴(列名和值)
Python Pandas Pivot Of Two columns (ColumnName and Value)
我有一个包含两列的 Panda 数据框,以及一个默认索引。第一列是预期的 'Column Name',第二列是该列所需的值。
name returnattribute
0 Customer Name Customer One Name
1 Customer Code CGLOSPA
2 Customer Name Customer Two Name
3 Customer Code COTHABA
4 Customer Name Customer Three Name
5 Customer Code CGLOADS
6 Customer Name Customer Four Name
7 Customer Code CAPRCANBRA
8 Customer Name Customer Five Name
9 Customer Code COTHAMO
我想 povit 这不是 10 行,而是 5 行和两列('Customer Name' 和 'Customer Code')。希望的结果如下:
Customer Code Customer Name
0 CGLOSPA Customer One Name
1 COTHABA Customer Two Name
2 CGLOADS Customer Three Name
3 CAPRCANBRA Customer Four Name
4 COTHAMO Customer Five Name
我尝试使用 pandas 数据透视函数:
df.pivot(columns='name', values='returnattribute')
但这导致十行仍然有交替的空白:
Customer Code Customer Name
0 NaN Customer One Name
1 CGLOSPA NaN
2 NaN Customer Two Name
3 COTHABA NaN
4 NaN Customer Three Name
5 CGLOADS NaN
6 NaN Customer Four Name
7 CAPRCANBRA NaN
8 NaN Customer Five Name
9 COTHAMO NaN
如何旋转数据框以仅获得两列的 5 行?
在df.pivot
中,当index
参数未传递时,df.index
被用作默认值。因此,输出。
index
: str or object or a list of str, optional
- Column to use to make new frame’s index. If
None
, uses existing index.
得到想要的输出。您必须创建一个新的索引列,如下所示。
df.assign(idx=df.index // 2).pivot(
index="idx", columns="name", values="returnattribute"
)
# name Customer Code Customer Name
# idx
# 0 CGLOSPA Customer One Name
# 1 COTHABA Customer Two Name
# 2 CGLOADS Customer Three Name
# 3 CAPRCANBRA Customer Four Name
# 4 COTHAMO Customer Five Name
因为每两行代表一个数据点。您可以 reshape
数据并构建所需的数据框。
reshaped = df['returnattribute'].to_numpy().reshape(-1, 2)
# array([['Customer One Name', 'CGLOSPA'],
# ['Customer Two Name', 'COTHABA'],
# ['Customer Three Name', 'CGLOADS'],
# ['Customer Four Name', 'CAPRCANBRA'],
# ['Customer Five Name', 'COTHAMO']], dtype=object)
col_names = pd.unique(df.name)
# array(['Customer Name', 'Customer Code'], dtype=object)
out = pd.DataFrame(reshaped, columns=col_names)
# Customer Name Customer Code
# 0 Customer One Name CGLOSPA
# 1 Customer Two Name COTHABA
# 2 Customer Three Name CGLOADS
# 3 Customer Four Name CAPRCANBRA
# 4 Customer Five Name COTHAMO
# we can reorder the columns using reindex.
您也可以直接将新索引传递给 pivot_table
,使用 aggfunc='first'
因为您有非数字数据:
df.pivot_table(index=df.index//2, columns='name',
values='returnattribute', aggfunc='first')
输出:
name Customer Code Customer Name
0 CGLOSPA Customer One Name
1 COTHABA Customer Two Name
2 CGLOADS Customer Three Name
3 CAPRCANBRA Customer Four Name
4 COTHAMO Customer Five Name
我有一个包含两列的 Panda 数据框,以及一个默认索引。第一列是预期的 'Column Name',第二列是该列所需的值。
name returnattribute
0 Customer Name Customer One Name
1 Customer Code CGLOSPA
2 Customer Name Customer Two Name
3 Customer Code COTHABA
4 Customer Name Customer Three Name
5 Customer Code CGLOADS
6 Customer Name Customer Four Name
7 Customer Code CAPRCANBRA
8 Customer Name Customer Five Name
9 Customer Code COTHAMO
我想 povit 这不是 10 行,而是 5 行和两列('Customer Name' 和 'Customer Code')。希望的结果如下:
Customer Code Customer Name
0 CGLOSPA Customer One Name
1 COTHABA Customer Two Name
2 CGLOADS Customer Three Name
3 CAPRCANBRA Customer Four Name
4 COTHAMO Customer Five Name
我尝试使用 pandas 数据透视函数:
df.pivot(columns='name', values='returnattribute')
但这导致十行仍然有交替的空白:
Customer Code Customer Name
0 NaN Customer One Name
1 CGLOSPA NaN
2 NaN Customer Two Name
3 COTHABA NaN
4 NaN Customer Three Name
5 CGLOADS NaN
6 NaN Customer Four Name
7 CAPRCANBRA NaN
8 NaN Customer Five Name
9 COTHAMO NaN
如何旋转数据框以仅获得两列的 5 行?
在df.pivot
中,当index
参数未传递时,df.index
被用作默认值。因此,输出。
index
: str or object or a list of str, optional
- Column to use to make new frame’s index. If
None
, uses existing index.
得到想要的输出。您必须创建一个新的索引列,如下所示。
df.assign(idx=df.index // 2).pivot(
index="idx", columns="name", values="returnattribute"
)
# name Customer Code Customer Name
# idx
# 0 CGLOSPA Customer One Name
# 1 COTHABA Customer Two Name
# 2 CGLOADS Customer Three Name
# 3 CAPRCANBRA Customer Four Name
# 4 COTHAMO Customer Five Name
因为每两行代表一个数据点。您可以 reshape
数据并构建所需的数据框。
reshaped = df['returnattribute'].to_numpy().reshape(-1, 2)
# array([['Customer One Name', 'CGLOSPA'],
# ['Customer Two Name', 'COTHABA'],
# ['Customer Three Name', 'CGLOADS'],
# ['Customer Four Name', 'CAPRCANBRA'],
# ['Customer Five Name', 'COTHAMO']], dtype=object)
col_names = pd.unique(df.name)
# array(['Customer Name', 'Customer Code'], dtype=object)
out = pd.DataFrame(reshaped, columns=col_names)
# Customer Name Customer Code
# 0 Customer One Name CGLOSPA
# 1 Customer Two Name COTHABA
# 2 Customer Three Name CGLOADS
# 3 Customer Four Name CAPRCANBRA
# 4 Customer Five Name COTHAMO
# we can reorder the columns using reindex.
您也可以直接将新索引传递给 pivot_table
,使用 aggfunc='first'
因为您有非数字数据:
df.pivot_table(index=df.index//2, columns='name',
values='returnattribute', aggfunc='first')
输出:
name Customer Code Customer Name
0 CGLOSPA Customer One Name
1 COTHABA Customer Two Name
2 CGLOADS Customer Three Name
3 CAPRCANBRA Customer Four Name
4 COTHAMO Customer Five Name