如何使用 python 读取多行中的 yaml

how to read yaml in multiple row using python

我们正在读取 python 中包含以下代码的 yaml 文件,但它给了我 [1 行 x 30 列] 但我想要它在 2 行中。 1 行用于 my_table_01,另一行用于 my_table_02(在代码下方提供示例数据)

import pandas as pd
from yaml import safe_load
def read_yaml(path):
    #fs = gcsfs.GCSFileSystem()
    with open(path, 'r') as f:
        df = pd.json_normalize(safe_load(f))
    return df
df_master = read_yaml('new 7.yml')
print(df_master)

我的新 7.yml 具有以下数据,

cat new 7.yml-->
config_queries: 
            my_table_01 :
                           PIPELINE_NAME: "table_01"
                           RUN_FLAG: "True"  
                           STAGE: "edw_to_eim"
                           SUBJECT_AREA: "account"
                           SOURCE_DATABASE: "dev_db"
                           SOURCE_TABLE_NAME: "table"
                           TARGET_DATABASE: "dev_db"
                           TARGET_TABLE_NAME: "table"
                           TARGET_TABLE_TYPE: "ed"
                           DELTA_COLUMN: "N"
                           DELTA_COLUMN_NAME: "N"
                           DOP_VALUE: "N"
                           SOURCE_QUERY_KPI: {'tab_kpi_01':True,'tab_kpi_02':True}
                           TARGET_QUERY_KPI: {'tab_kpi_01':True,'tab_kpi_02':True}
                                                                                                                         
            my_table_02  : 
                           PIPELINE_NAME: "table_02"
                           RUN_FLAG: "True"  
                           STAGE: "edw_to_eim"
                           SUBJECT_AREA: "account"
                           SOURCE_DATABASE: "dev_db"
                           SOURCE_TABLE_NAME: "table"
                           TARGET_DATABASE: "dev_db"
                           TARGET_TABLE_NAME: "table"
                           TARGET_TABLE_TYPE: "ed"
                           DELTA_COLUMN: "N"
                           DELTA_COLUMN_NAME: "N"
                           DOP_VALUE: "N"
                           SOURCE_QUERY_KPI: {'tab_kpi_01':True}
                           TARGET_QUERY_KPI: {'tab_kpi_01':True}
如果要创建多行,

json_normalize 需要 dictslist,而不是单个嵌套 dict。因此,您需要将嵌套的 dict 'unpack' 转换为 dictslist,例如,将 config_queriesvalues():

import pandas as pd        
from yaml import safe_load
def read_yaml(path):
    #fs = gcsfs.GCSFileSystem()
    with open(path, 'r') as f:
        df = pd.json_normalize([safe_load(f)['config_queries'].values()])
    return df
df_master = read_yaml('new 7.yml')
print(df_master)

#   PIPELINE_NAME RUN_FLAG  ... TARGET_QUERY_KPI.tab_kpi_01 TARGET_QUERY_KPI.tab_kpi_02
0      table_01     True  ...                        True                        True
1      table_02     True  ...                        True                         NaN

[2 rows x 16 columns]