Python Pandas 两列的枢轴（列名和值）

Question

我有一个包含两列的 Panda 数据框，以及一个默认索引。第一列是预期的 'Column Name'，第二列是该列所需的值。

    name            returnattribute
0   Customer Name   Customer One Name
1   Customer Code   CGLOSPA
2   Customer Name   Customer Two Name
3   Customer Code   COTHABA
4   Customer Name   Customer Three Name
5   Customer Code   CGLOADS
6   Customer Name   Customer Four Name
7   Customer Code   CAPRCANBRA
8   Customer Name   Customer Five Name
9   Customer Code   COTHAMO

我想 povit 这不是 10 行，而是 5 行和两列（'Customer Name' 和 'Customer Code'）。希望的结果如下：

    Customer Code   Customer Name
0   CGLOSPA         Customer One Name
1   COTHABA         Customer Two Name
2   CGLOADS         Customer Three Name
3   CAPRCANBRA      Customer Four Name
4   COTHAMO         Customer Five Name

我尝试使用 pandas 数据透视函数：

df.pivot(columns='name', values='returnattribute')

但这导致十行仍然有交替的空白：

    Customer Code   Customer Name
0   NaN             Customer One Name
1   CGLOSPA         NaN
2   NaN             Customer Two Name
3   COTHABA         NaN
4   NaN             Customer Three Name
5   CGLOADS         NaN
6   NaN             Customer Four Name
7   CAPRCANBRA      NaN
8   NaN             Customer Five Name
9   COTHAMO         NaN

如何旋转数据框以仅获得两列的 5 行？

Answer 1

在df.pivot中，当index参数未传递时，df.index被用作默认值。因此，输出。

From Docs DataFrame.pivot:

index: str or object or a list of str, optional

Column to use to make new frame’s index. If None, uses existing index.

得到想要的输出。您必须创建一个新的索引列，如下所示。

df.assign(idx=df.index // 2).pivot(
    index="idx", columns="name", values="returnattribute"
)

# name Customer Code        Customer Name
# idx                                    
# 0          CGLOSPA    Customer One Name
# 1          COTHABA    Customer Two Name
# 2          CGLOADS  Customer Three Name
# 3       CAPRCANBRA   Customer Four Name
# 4          COTHAMO   Customer Five Name

因为每两行代表一个数据点。您可以 reshape 数据并构建所需的数据框。

reshaped = df['returnattribute'].to_numpy().reshape(-1, 2)
# array([['Customer One Name', 'CGLOSPA'],
#        ['Customer Two Name', 'COTHABA'],
#        ['Customer Three Name', 'CGLOADS'],
#        ['Customer Four Name', 'CAPRCANBRA'],
#        ['Customer Five Name', 'COTHAMO']], dtype=object)

col_names = pd.unique(df.name)
# array(['Customer Name', 'Customer Code'], dtype=object)

out = pd.DataFrame(reshaped, columns=col_names)

#          Customer Name Customer Code
# 0    Customer One Name       CGLOSPA
# 1    Customer Two Name       COTHABA
# 2  Customer Three Name       CGLOADS
# 3   Customer Four Name    CAPRCANBRA
# 4   Customer Five Name       COTHAMO

# we can reorder the columns using reindex.

Answer 2

您也可以直接将新索引传递给 pivot_table，使用 aggfunc='first' 因为您有非数字数据：

df.pivot_table(index=df.index//2, columns='name',
               values='returnattribute', aggfunc='first')

输出：

name Customer Code        Customer Name
0          CGLOSPA    Customer One Name
1          COTHABA    Customer Two Name
2          CGLOADS  Customer Three Name
3       CAPRCANBRA   Customer Four Name
4          COTHAMO   Customer Five Name

Python Pandas 两列的枢轴（列名和值）

Python Pandas Pivot Of Two columns (ColumnName and Value)

python

pivot

dataframe

pandas