Vertex AI 预定笔记本无法识别文件夹的存在

Question

我在 Vertex AI 中有一个托管的 jupyter 笔记本，我想安排它。只要我手动启动，笔记本就可以正常工作，但一旦安排好，它就会失败。事实上，有很多事情在预定的时候出了问题，其中一些是可以修复的。在解释我的问题是什么之前，让我先说一下上下文的细节。

Notebook 从 API 收集多个商店的信息，并在处理之前将数据保存在不同的文件夹中，将 csv 文件保存到特定于商店的文件夹和 bigquery。所以，在笔记本的位置，我有：

笔记本
处理数据所需的函数（作为 *.py 文件）
一系列文件夹，其中一些有子文件夹也有子文件夹

当我手动执行时，没问题。一切正常，所有文件最终都准确地出现在它们应该出现的位置，以及不同的 bigQuery 表中。

但是，在安排笔记本的执行时，一切都出错了。首先，文件 *.py 无法读取（如 import）。没问题，我在笔记本里添加了功能

现在，下面的错误让我不知所措，因为我不知道它为什么有效或如何修复它。导致错误的代码如下：

internal = "https://api.************************"

df_descriptions = [] 

storess = internal
response_stores = requests.get(storess,auth = HTTPBasicAuth(userInternal, keyInternal))
pathlib.Path("stores/request_1.json").write_bytes(response_stores.content)

filepath = "stores"

files = os.listdir(filepath)

for file in files:
    with open(filepath + "/"+file) as json_string:
        jsonstr = json.load(json_string)
        information = pd.json_normalize(jsonstr)
    df_descriptions.append(information)

StoreINFO = pd.concat(df_descriptions)
StoreINFO = StoreINFO.dropna()
StoreINFO = StoreINFO[StoreINFO['storeIdMappings'].map(lambda d: len(d)) > 0]

cloud_store_ids = list(set(StoreINFO.cloudStoreId))

LastWeek = datetime.date.today()- timedelta(days=2)
LastWeek =np.datetime64(LastWeek)

报错是：

FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_165/2970402631.py in <module>
      5 storess = internal
      6 response_stores = requests.get(storess,auth = HTTPBasicAuth(userInternal, keyInternal))
----> 7 pathlib.Path("stores/request_1.json").write_bytes(response_stores.content)
      8 
      9 filepath = "stores"

/opt/conda/lib/python3.7/pathlib.py in write_bytes(self, data)
   1228         # type-check for the buffer interface before truncating the file
   1229         view = memoryview(data)
-> 1230         with self.open(mode='wb') as f:
   1231             return f.write(view)
   1232 

/opt/conda/lib/python3.7/pathlib.py in open(self, mode, buffering, encoding, errors, newline)
   1206             self._raise_closed()
   1207         return io.open(self, mode, buffering, encoding, errors, newline,
-> 1208                        opener=self._opener)
   1209 
   1210     def read_bytes(self):

/opt/conda/lib/python3.7/pathlib.py in _opener(self, name, flags, mode)
   1061     def _opener(self, name, flags, mode=0o666):
   1062         # A stub for the opener argument to built-in open()
-> 1063         return self._accessor.open(self, flags, mode)
   1064 
   1065     def _raw_open(self, flags, mode=0o777):

FileNotFoundError: [Errno 2] No such file or directory: 'stores/request_1.json'

我很乐意找到另一种方法来执行此操作，例如使用 GCS 存储桶，但我的问题是子文件夹的存在。有很多商店，我不希望手动执行此操作，因为我正在执行此操作的一些零售商有超过 1000 家商店。我的 python 代码生成了所有这些文件夹，据我所知，这在 GCS 中是不可行的。

我该如何解决这个问题？

Answer 1

GCS 使用平面命名空间，因此文件夹实际上并不存在，但可以像 documentation.For your requirement, you can either use absolute path (starting with "/" -- not relative) or create the "stores" directory (with "mkdir"). For more information you can check this blog 中给出的那样进行模拟。

Vertex AI 预定笔记本无法识别文件夹的存在

Vertex AI scheduled notebooks doesn't recognize existence of folders

python-3.x

google-cloud-platform

google-cloud-vertex-ai