如何使用 python 读取多行中的 yaml
how to read yaml in multiple row using python
我们正在读取 python 中包含以下代码的 yaml 文件,但它给了我 [1 行 x 30 列]
但我想要它在 2 行中。 1 行用于 my_table_01,另一行用于 my_table_02(在代码下方提供示例数据)
import pandas as pd
from yaml import safe_load
def read_yaml(path):
#fs = gcsfs.GCSFileSystem()
with open(path, 'r') as f:
df = pd.json_normalize(safe_load(f))
return df
df_master = read_yaml('new 7.yml')
print(df_master)
我的新 7.yml 具有以下数据,
cat new 7.yml-->
config_queries:
my_table_01 :
PIPELINE_NAME: "table_01"
RUN_FLAG: "True"
STAGE: "edw_to_eim"
SUBJECT_AREA: "account"
SOURCE_DATABASE: "dev_db"
SOURCE_TABLE_NAME: "table"
TARGET_DATABASE: "dev_db"
TARGET_TABLE_NAME: "table"
TARGET_TABLE_TYPE: "ed"
DELTA_COLUMN: "N"
DELTA_COLUMN_NAME: "N"
DOP_VALUE: "N"
SOURCE_QUERY_KPI: {'tab_kpi_01':True,'tab_kpi_02':True}
TARGET_QUERY_KPI: {'tab_kpi_01':True,'tab_kpi_02':True}
my_table_02 :
PIPELINE_NAME: "table_02"
RUN_FLAG: "True"
STAGE: "edw_to_eim"
SUBJECT_AREA: "account"
SOURCE_DATABASE: "dev_db"
SOURCE_TABLE_NAME: "table"
TARGET_DATABASE: "dev_db"
TARGET_TABLE_NAME: "table"
TARGET_TABLE_TYPE: "ed"
DELTA_COLUMN: "N"
DELTA_COLUMN_NAME: "N"
DOP_VALUE: "N"
SOURCE_QUERY_KPI: {'tab_kpi_01':True}
TARGET_QUERY_KPI: {'tab_kpi_01':True}
如果要创建多行,json_normalize
需要 dicts
的 list
,而不是单个嵌套 dict
。因此,您需要将嵌套的 dict
'unpack' 转换为 dicts
的 list
,例如,将 config_queries
的 values()
:
import pandas as pd
from yaml import safe_load
def read_yaml(path):
#fs = gcsfs.GCSFileSystem()
with open(path, 'r') as f:
df = pd.json_normalize([safe_load(f)['config_queries'].values()])
return df
df_master = read_yaml('new 7.yml')
print(df_master)
# PIPELINE_NAME RUN_FLAG ... TARGET_QUERY_KPI.tab_kpi_01 TARGET_QUERY_KPI.tab_kpi_02
0 table_01 True ... True True
1 table_02 True ... True NaN
[2 rows x 16 columns]
我们正在读取 python 中包含以下代码的 yaml 文件,但它给了我 [1 行 x 30 列] 但我想要它在 2 行中。 1 行用于 my_table_01,另一行用于 my_table_02(在代码下方提供示例数据)
import pandas as pd
from yaml import safe_load
def read_yaml(path):
#fs = gcsfs.GCSFileSystem()
with open(path, 'r') as f:
df = pd.json_normalize(safe_load(f))
return df
df_master = read_yaml('new 7.yml')
print(df_master)
我的新 7.yml 具有以下数据,
cat new 7.yml-->
config_queries:
my_table_01 :
PIPELINE_NAME: "table_01"
RUN_FLAG: "True"
STAGE: "edw_to_eim"
SUBJECT_AREA: "account"
SOURCE_DATABASE: "dev_db"
SOURCE_TABLE_NAME: "table"
TARGET_DATABASE: "dev_db"
TARGET_TABLE_NAME: "table"
TARGET_TABLE_TYPE: "ed"
DELTA_COLUMN: "N"
DELTA_COLUMN_NAME: "N"
DOP_VALUE: "N"
SOURCE_QUERY_KPI: {'tab_kpi_01':True,'tab_kpi_02':True}
TARGET_QUERY_KPI: {'tab_kpi_01':True,'tab_kpi_02':True}
my_table_02 :
PIPELINE_NAME: "table_02"
RUN_FLAG: "True"
STAGE: "edw_to_eim"
SUBJECT_AREA: "account"
SOURCE_DATABASE: "dev_db"
SOURCE_TABLE_NAME: "table"
TARGET_DATABASE: "dev_db"
TARGET_TABLE_NAME: "table"
TARGET_TABLE_TYPE: "ed"
DELTA_COLUMN: "N"
DELTA_COLUMN_NAME: "N"
DOP_VALUE: "N"
SOURCE_QUERY_KPI: {'tab_kpi_01':True}
TARGET_QUERY_KPI: {'tab_kpi_01':True}
json_normalize
需要 dicts
的 list
,而不是单个嵌套 dict
。因此,您需要将嵌套的 dict
'unpack' 转换为 dicts
的 list
,例如,将 config_queries
的 values()
:
import pandas as pd
from yaml import safe_load
def read_yaml(path):
#fs = gcsfs.GCSFileSystem()
with open(path, 'r') as f:
df = pd.json_normalize([safe_load(f)['config_queries'].values()])
return df
df_master = read_yaml('new 7.yml')
print(df_master)
# PIPELINE_NAME RUN_FLAG ... TARGET_QUERY_KPI.tab_kpi_01 TARGET_QUERY_KPI.tab_kpi_02
0 table_01 True ... True True
1 table_02 True ... True NaN
[2 rows x 16 columns]