运行来自 google cloud dataproc python notebook 的 .py 文件

Question

我将我的文件放在 dataproc 存储上：

inotebook.ipynb
依赖项
- test1.py
- test2.py
- __ 初始化 __.py

目前正在处理 inotebook.ipynb 文件，需要使用 test1.py 和 test2.py 文件中的函数。在本地，我可以使用 !python ....py 并使用可用的函数（或创建一个包并安装）。 google cloud dataproc notebook 上有这些选项吗？

我尝试了来自以下链接的建议并且 none 奏效了：

Dataproc import python module stored in google cloud storage (gcs) bucket

是否可以安装自定义包或以某种方式安装运行 .py 文件，这些文件来自与我在 dataproc 上的笔记本文件相同的子目录？

Answer 1

不幸的是，dataproc 仍然限制使用存储在 GCS 中的自定义包。我能够通过一些更改使 mentioned workaround 工作。我添加了 prefix 的定义，以便能够正确指向目录下的正确文件，并循环遍历返回的对象以将文件下载到本地 dataproc 集群并执行后续代码行。请参阅下面的代码：

GCS 桶结构：

my-bucket
  └───notebooks
        └───jupyter
            |   gcs_test.ipynb
            └───dependencies
                  └─── hi_gcs.py
                  └─── hello_gcs.py

hi_gcs.py:

def say_hi(name):
    return "Hi {}!".format(name)

hello_gcs.py:

def say_hello(name):
    return "Hello {}!".format(name)

gcs_test.ipynb:

from google.cloud import storage

def get_module():

    client = storage.Client()
    bucket = client.get_bucket('my-bucket')
    blobs = list(client.list_blobs(bucket,prefix='notebooks/jupyter/dependencies/'))
    # define the path to your python files at prefix
    for blob in blobs[1:]: # skip 1st element since this is the top directory
        name = blob.name.split('/')[-1] # get the filename only
        blob.download_to_filename(name) # download python files to the local dataproc cluster
    
def use_my_module(val):
    get_module()
    import hi_gcs 
    import hello_gcs 

    print(hello_gcs.say_hello(val))
    print(hi_gcs.say_hi(val))

use_my_module('User 1')

输出：

Hello User 1!
Hi User 1!

运行来自 google cloud dataproc python notebook 的 .py 文件

Run .py file from google cloud dataproc python notebook

python

google-cloud-storage

google-cloud-dataproc

运行 来自 google cloud dataproc python notebook 的 .py 文件

Run .py file from google cloud dataproc python notebook

python

google-cloud-storage

google-cloud-dataproc

运行来自 google cloud dataproc python notebook 的 .py 文件