有没有办法在 GCP 中自动执行此 Python 脚本？

Question

我是使用 GCP 的初学者functions/products。我在下面编写了以下代码，它从本地文件夹中获取城市列表，并调用该列表中每个城市的天气数据，最终将这些天气值上传到 BigQuery 中的 table。我不需要再更改代码，因为它会在新的一周开始时创建新的 tables，现在我想“部署”（我什至不确定这是否称为部署代码）云为它自动运行那里。我尝试使用 App Engine 和 Cloud Functions，但在这两个地方都遇到了问题。

import requests, json, sqlite3, os, csv, datetime, re
from google.cloud import bigquery
#from google.cloud import storage

list_city = []
with open("list_of_cities.txt", "r") as pointer:
    for line in pointer:
        list_city.append(line.strip())

API_key = "PLACEHOLDER"
Base_URL = "http://api.weatherapi.com/v1/history.json?key="

yday = datetime.date.today() - datetime.timedelta(days = 1)
Date = yday.strftime("%Y-%m-%d")

table_id = f"sonic-cat-315013.weather_data.Historical_Weather_{yday.isocalendar()[0]}_{yday.isocalendar()[1]}"

credentials_path = r"PATH_TO_JSON_FILE"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = credentials_path

client = bigquery.Client()

try:
    schema = [
        bigquery.SchemaField("city", "STRING", mode="REQUIRED"),
        bigquery.SchemaField("Date", "Date", mode="REQUIRED"),
        bigquery.SchemaField("Hour", "INTEGER", mode="REQUIRED"),
        bigquery.SchemaField("Temperature", "FLOAT", mode="REQUIRED"),
        bigquery.SchemaField("Humidity", "FLOAT", mode="REQUIRED"),
        bigquery.SchemaField("Condition", "STRING", mode="REQUIRED"),
        bigquery.SchemaField("Chance_of_rain", "FLOAT", mode="REQUIRED"),
        bigquery.SchemaField("Precipitation_mm", "FLOAT", mode="REQUIRED"),
        bigquery.SchemaField("Cloud_coverage", "INTEGER", mode="REQUIRED"),
        bigquery.SchemaField("Visibility_km", "FLOAT", mode="REQUIRED")
    ]


    table = bigquery.Table(table_id, schema=schema)
    table.time_partitioning = bigquery.TimePartitioning(
        type_=bigquery.TimePartitioningType.DAY,
        field="Date",  # name of column to use for partitioning
    )
    table = client.create_table(table)  # Make an API request.
    print(
        "Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
    )
except:
    print("Table {}_{} already exists".format(yday.isocalendar()[0], yday.isocalendar()[1]))

    
def get_weather():
    try:
        x["location"]
    except:
        print(f"API could not call city {city_name}")
        
    global day, time, dailytemp, dailyhum, dailycond, chance_rain, Precipitation, Cloud_coverage, Visibility_km    
    
    day = []
    time = []
    dailytemp = []
    dailyhum = []
    dailycond = []
    chance_rain = []
    Precipitation = []
    Cloud_coverage = []
    Visibility_km = []
    
    for i in range(24):
        dayval = re.search("^\S*\s" ,x["forecast"]["forecastday"][0]["hour"][i]["time"])
        timeval = re.search("\s(.*)" ,x["forecast"]["forecastday"][0]["hour"][i]["time"])
       
        day.append(dayval.group()[:-1])
        time.append(timeval.group()[1:])
        dailytemp.append(x["forecast"]["forecastday"][0]["hour"][i]["temp_c"])
        dailyhum.append(x["forecast"]["forecastday"][0]["hour"][i]["humidity"])
        dailycond.append(x["forecast"]["forecastday"][0]["hour"][i]["condition"]["text"])
        chance_rain.append(x["forecast"]["forecastday"][0]["hour"][i]["chance_of_rain"])
        Precipitation.append(x["forecast"]["forecastday"][0]["hour"][i]["precip_mm"])
        Cloud_coverage.append(x["forecast"]["forecastday"][0]["hour"][i]["cloud"])
        Visibility_km.append(x["forecast"]["forecastday"][0]["hour"][i]["vis_km"])
    for i in range(len(time)):
        time[i] = int(time[i][:2])

def main():
    i = 0
    while i < len(list_city):
        try:
            global city_name
            city_name = list_city[i]
            complete_URL = Base_URL + API_key + "&q=" + city_name + "&dt=" + Date
            response = requests.get(complete_URL, timeout = 10)
            global x
            x = response.json()

            get_weather()
            table = client.get_table(table_id)
            varlist = []
            for j in range(24):
                variables = city_name, day[j], time[j], dailytemp[j], dailyhum[j], dailycond[j], chance_rain[j], Precipitation[j], Cloud_coverage[j], Visibility_km[j]
                varlist.append(variables)
                
            client.insert_rows(table, varlist)
            print(f"City {city_name}, ({i+1} out of {len(list_city)}) successfully inserted")
            i += 1
        except Exception as e:
            print(e)
            continue

在代码中，直接引用了位于本地的两个文件，一个是城市列表，另一个是 JSON 文件，其中包含用于访问我在 GCP 中的项目的凭据。我相信将这些文件上传到 Cloud Storage 并引用它们不会有问题，但后来我意识到，如果不使用凭据文件，我实际上无法访问 Cloud Storage 中的存储桶。

这让我不确定整个过程是否可行，如果我需要先在本地引用，我该如何首先从云端进行身份验证？似乎是一个无限循环，我从云存储中的文件进行身份验证，但我需要先进行身份验证才能访问该文件。

我真的很感谢这里的一些帮助，我不知道从哪里开始，而且我对 SE/CS 也不是很了解，我只知道 Python R 和SQL.

Answer 1

可能有不同的风格和选项来部署您的应用程序，这些将取决于您的应用程序语义和执行约束。

很难涵盖所有这些内容，官方 Google 云平台文档非常详细地介绍了所有这些内容：

Google 计算引擎
Google Kubernetes 引擎
Google 应用引擎
Google 云函数
Google云运行

根据我对您的应用程序设计的理解，最合适的是：

Google 应用引擎
Google 云函数
Google 云运行：检查 these criteria 看看您的应用程序是否适合这种部署方式

我建议使用 Cloud Functions 作为部署选项，在这种情况下，您的应用程序将默认使用项目 App Engine 服务帐户验证自身并执行允许的操作。因此，您应该只检查 IAM 配置部分下的默认帐户 PROJECT_ID@appspot.gserviceaccount.com 是否具有正确的访问权限需要的 API（BigQuery 在你的情况下）。

在这样的设置中，您需要将您的服务帐户密钥推送到 Cloud Storage，我建议在任何一种情况下都避免这样做，并且您需要将其拉出因为运行时将为您处理身份验证功能。

Answer 2

对于 Cloud Functions，部署的函数将运行默认使用项目服务帐户凭据，不需要单独的凭据文件。只需确保此服务帐户有权访问它将尝试访问的任何资源。

您可以在此处阅读有关此方法的更多信息（以及根据需要使用不同服务帐户的选项）：https://cloud.google.com/functions/docs/securing/function-identity

这种方法非常简单，并且让您完全不必在服务器上处理凭据文件。请注意，您应该删除 os.environ 行，因为它不需要。 BigQuery 客户端将使用如上所述的默认凭据。

如果您希望代码运行无论是在您的本地计算机上还是部署到云端都一样，只需在 OS 上永久设置一个“GOOGLE_APPLICATION_CREDENTIALS”环境变量你的机器。这类似于您在发布的代码中所做的；但是，您每次使用 os.environ 都是临时设置它，而不是在您的机器上永久设置环境变量。 os.environ 调用仅为该进程执行设置该环境变量。

如果出于某种原因您不想使用上面概述的默认服务帐户方法，您可以在实例化 bigquery.Client()

时直接引用它

https://cloud.google.com/bigquery/docs/authentication/service-account-file

您只需将凭据文件与您的代码打包在一起（即与您的 main.py 文件位于同一文件夹中），并将其部署在执行环境中。在这种情况下，它是来自脚本的 referenceable/loadable，不需要任何特殊权限或凭据。只需提供文件的相对路径（即假设您将其与 python 脚本放在同一目录中，仅引用文件名）

有没有办法在 GCP 中自动执行此 Python 脚本？

Is there a way to automate this Python script in GCP?

authentication

google-app-engine

automation

google-cloud-platform

google-cloud-functions