Pandas Pivot 没有产生预期的输出
Pandas Pivot is not producing desired output
我的数据如下所示。我正在尝试旋转数据框,使 SCHEMA NAME AND TABLE NAME 在列中,行计数、Table 类型、创建日期和更改日期在第一列中。我正在引用这个 来实现我想要的输出。不幸的是,找不到我 need.Below 是我的代码和所需输出的解决方案。代码在正确的列中得到 SCHEMA_NAME 和“TABLE_NAME”,只有代码的索引和值部分没有产生所需的输出
提前感谢您的时间和努力!
数据框:
{'TABLE_SCHEMA': 0 TPCDS_SF100TCL
1 TPCDS_SF100TCL
2 TPCDS_SF100TCL
3 TPCDS_SF100TCL
4 TPCDS_SF100TCL
5 TPCDS_SF10TCL
6 TPCDS_SF10TCL
7 TPCDS_SF10TCL
8 TPCDS_SF10TCL
9 TPCDS_SF10TCL
Name: TABLE_SCHEMA, dtype: object,
'TABLE_TYPE': 0 BASE TABLE
1 BASE TABLE
2 BASE TABLE
3 BASE TABLE
4 BASE TABLE
5 BASE TABLE
6 BASE TABLE
7 BASE TABLE
8 BASE TABLE
9 BASE TABLE
Name: TABLE_TYPE, dtype: object,
'TABLE_NAME': 0 CALL_CENTER
1 CATALOG_PAGE
2 CUSTOMER
3 CUSTOMER_ADDRESS
4 CUSTOMER_DEMOGRAPHICS
5 CALL_CENTER
6 CATALOG_PAGE
7 CUSTOMER
8 CUSTOMER_DEMOGRAPHICS
9 CUSTOMER_ADDRESS
Name: TABLE_NAME, dtype: object,
'ROW_COUNT': 0 60
1 50000
2 100000000
3 50000000
4 1920800
5 54
6 40000
7 65000000
8 1920800
9 32500000
Name: ROW_COUNT, dtype: object,
'TABLE_CREATED_DATE': 0 2022-03-02
1 2022-03-02
2 2022-03-02
3 2022-03-02
4 2022-03-02
5 2022-03-02
6 2022-03-02
7 2022-03-02
8 2022-03-02
9 2022-03-02
Name: TABLE_CREATED_DATE, dtype: object,
'LAST_ALTERED_DATE': 0 2022-05-06
1 2022-03-02
2 2022-03-02
3 2022-03-02
4 2022-03-02
5 2022-03-02
6 2022-03-02
7 2022-03-02
8 2022-03-02
9 2022-03-02
Name: LAST_ALTERED_DATE, dtype: object}
Python代码:
pd.pivot(df, columns = ["TABLE_SCHEMA","TABLE_NAME"],index=['ROW_COUNT','TABLE_TYPE','TABLE_CREATED_DATE','LAST_ALTERED_DATE'],
values=['ROW_COUNT','TABLE_TYPE','TABLE_CREATED_DATE','LAST_ALTERED_DATE'])
期望的输出(下面的输出是针对 1 个模式的,我需要两个模式都在一个 table)
TABLE_SCHEMA TPCDS_SF100TCL
TABLE_NAME CALL_CENTER CATALOG_PAGE CUSTOMER CUSTOMER_ADDRESS CUSTOMER_DEMOGRAPHICS
ROW_COUNT
TABLE_TYPE
TABLE_CREATED_DATE
LAST_ALTERED_DATE
试试这个:
data_df.set_index(['TABLE_SCHEMA', 'TABLE_NAME'], drop=True).T
data_df 是您提供的原始数据框
我的数据如下所示。我正在尝试旋转数据框,使 SCHEMA NAME AND TABLE NAME 在列中,行计数、Table 类型、创建日期和更改日期在第一列中。我正在引用这个
提前感谢您的时间和努力!
数据框:
{'TABLE_SCHEMA': 0 TPCDS_SF100TCL
1 TPCDS_SF100TCL
2 TPCDS_SF100TCL
3 TPCDS_SF100TCL
4 TPCDS_SF100TCL
5 TPCDS_SF10TCL
6 TPCDS_SF10TCL
7 TPCDS_SF10TCL
8 TPCDS_SF10TCL
9 TPCDS_SF10TCL
Name: TABLE_SCHEMA, dtype: object,
'TABLE_TYPE': 0 BASE TABLE
1 BASE TABLE
2 BASE TABLE
3 BASE TABLE
4 BASE TABLE
5 BASE TABLE
6 BASE TABLE
7 BASE TABLE
8 BASE TABLE
9 BASE TABLE
Name: TABLE_TYPE, dtype: object,
'TABLE_NAME': 0 CALL_CENTER
1 CATALOG_PAGE
2 CUSTOMER
3 CUSTOMER_ADDRESS
4 CUSTOMER_DEMOGRAPHICS
5 CALL_CENTER
6 CATALOG_PAGE
7 CUSTOMER
8 CUSTOMER_DEMOGRAPHICS
9 CUSTOMER_ADDRESS
Name: TABLE_NAME, dtype: object,
'ROW_COUNT': 0 60
1 50000
2 100000000
3 50000000
4 1920800
5 54
6 40000
7 65000000
8 1920800
9 32500000
Name: ROW_COUNT, dtype: object,
'TABLE_CREATED_DATE': 0 2022-03-02
1 2022-03-02
2 2022-03-02
3 2022-03-02
4 2022-03-02
5 2022-03-02
6 2022-03-02
7 2022-03-02
8 2022-03-02
9 2022-03-02
Name: TABLE_CREATED_DATE, dtype: object,
'LAST_ALTERED_DATE': 0 2022-05-06
1 2022-03-02
2 2022-03-02
3 2022-03-02
4 2022-03-02
5 2022-03-02
6 2022-03-02
7 2022-03-02
8 2022-03-02
9 2022-03-02
Name: LAST_ALTERED_DATE, dtype: object}
Python代码:
pd.pivot(df, columns = ["TABLE_SCHEMA","TABLE_NAME"],index=['ROW_COUNT','TABLE_TYPE','TABLE_CREATED_DATE','LAST_ALTERED_DATE'],
values=['ROW_COUNT','TABLE_TYPE','TABLE_CREATED_DATE','LAST_ALTERED_DATE'])
期望的输出(下面的输出是针对 1 个模式的,我需要两个模式都在一个 table)
TABLE_SCHEMA TPCDS_SF100TCL
TABLE_NAME CALL_CENTER CATALOG_PAGE CUSTOMER CUSTOMER_ADDRESS CUSTOMER_DEMOGRAPHICS
ROW_COUNT
TABLE_TYPE
TABLE_CREATED_DATE
LAST_ALTERED_DATE
试试这个:
data_df.set_index(['TABLE_SCHEMA', 'TABLE_NAME'], drop=True).T
data_df 是您提供的原始数据框