正在从 Airflow 上的 egg 文件加载 config.jon

Loading config.jon from egg file on Airflow

如何从 egg 文件加载配置文件?我正在尝试将 运行 python 代码打包为 Airflow 上的 egg 文件。在代码中,它尝试加载一个 config.json 文件,但在 Airflow 上无法 运行。我想问题是它试图从 egg 文件中读取文件,但由于它是压缩的,所以找不到它。我按如下方式更新了 setup.py 以确保配置文件在 pckage 中:

from setuptools import find_packages, setup

setup(
    name='tv_quality_assurance',
    packages=find_packages(),
    version='0.1.0',
    description='Quality checks on IPTV linear viewing data',
    author='Sarah Berenji',
    data_files=[('src/codes', ['src/codes/config.json'])],
    include_package_data=True,
    license='',
)

现在它抱怨 config_file_path 不是目录:

NotADirectoryError: [Errno 20] Not a directory: '/opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/config.json'

我检查了路径,json 文件就在那里。这是我的代码,在调试中添加了一些打印语句,表明它没有将 config_file_path 视为文件或目录:

dir_path = os.path.dirname(__file__)
config_file_path = dir_path + '/config.json'

print(f"config_file_path = {config_file_path}")
print(f"relpath(config_file_path) = {os.path.relpath(config_file_path)}")

if not os.path.isfile(config_file_path):
    print(f"{config_file_path} is not a file")
if not os.path.isdir(config_file_path):
    print(f"{config_file_path} is not a dir")

with open(config_file_path) as json_file:
    config = json.load(json_file)

它returns以下输出:

config_file_path = /opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/config.json
relpath(config_file_path) = ../../artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/config.json
/opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/config.json is not a file
/opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/config.json is not a dir

Traceback (most recent call last):
File "/opt/test_AF1.10.2_py2/dags/py_spark_entry_point.py", line 8, in <module>
execute(spark)
File "/opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/entry_point.py", line 26, in execute
File "/opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/data_methods.py", line 32, in load_config_file
NotADirectoryError: [Errno 20] Not a directory: '/opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/config.json'

作为我的下一次尝试,我尝试使用 importlib_resources 但最终出现了一个奇怪的错误,即模块未安装但日志显示它已通过 pip 成功安装:ModuleNotFoundError: No module named 'importlib_resources'

import importlib_resources

config_file = importlib_resources.files("src.codes") / "config.json"
with open(config_file) as json_file:
    config = json.load(json_file)

我刚刚使用 pkg_resources:

做到了
config_file = pkg_resources.resource_stream('src.codes', 'config.json')
config = json.load(config_file)