云函数在本地运行但在 GCP 中崩溃(函数执行耗时 12681 毫秒,完成状态:'crash')

Cloud function working locally but crashing in GCP (Function execution took 12681 ms, finished with status: 'crash')

这是我的代码的超级缩略版(我会在评论中为所有内容添加 link)

import ...

...

def main(data, context):
    log_client = logging.Client()

    log_name = 'cloudfunctions.googleapis.com%2Fcloud-functions'

    res = Resource(type="cloud_function",
                   labels={
                       "function_name": "refresh_classes",
                       "region": os.environ.get("FUNC_REGION")
                   })
    logger = log_client.logger(log_name.format(os.environ.get("PROJECT_ID")))

    db = sqlalchemy.create_engine(
        sqlalchemy.engine.url.URL(
            drivername="mysql+pymysql",
            username=os.environ.get("DB_USER"),
            password=os.environ.get("DB_PASS"),
            host=os.environ.get("DB_HOST"),
            port=3306,
            database=PRIMARY_TABLE_NAME
        ),
        pool_size=5,
        max_overflow=2,
        pool_timeout=30,
        pool_recycle=1800
    )
    start_time = perf_counter()

    check_if_table_exists(db)

    for i in range(START_IDX, END_IDX):
        print(i)
        logger.log_text(f"Checking class with id {i}", resource=res, severity="INFO")

        ...

    logger.log_text(f"Total seconds elapsed: {perf_counter() - start_time}", resource=res, severity="INFO")


if __name__ == '__main__':
    main('data', 'context')

当我 运行 在本地执行上述云功能时,我的 GOOGLE_APPLICATION_CREDENTIALS 配置以及我的本地云 MySQL 代理设置,详细的云日志记录通过并且功能完成顺利,完全符合我的预期:

然而,当我将整个东西部署到 GCP 并尝试通过控制台触发它(云消息传递触发器)时,我得到的只是日志记录方面的内容:

实际文字

{
 insertId: "******"  
 labels: {
  execution_id: "******"   
 }
 logName: "projects/******/logs/cloudfunctions.googleapis.com%2Fcloud-functions"  
 receiveTimestamp: "2020-05-29T22:11:13.435688367Z"  
 resource: {
  labels: {
   function_name: "******"    
   project_id: "******"    
   region: "us-central1"    
  }
  type: "cloud_function"   
 }
 severity: "DEBUG"  
 textPayload: "Function execution started"  
 timestamp: "2020-05-29T22:11:03.069889708Z"  
 trace: "projects/******/traces/******"  
}

{
 insertId: "******"  
 labels: {
  execution_id: "******"   
 }
 logName: "projects/******/logs/cloudfunctions.googleapis.com%2Fcloud-functions"  
 receiveTimestamp: "2020-05-29T22:11:16.331311285Z"  
 resource: {
  labels: {
   function_name: "******"    
   project_id: "******"    
   region: "us-central1"    
  }
  type: "cloud_function"   
 }
 severity: "DEBUG"  
 textPayload: "Function execution took 12362 ms, finished with status: 'crash'"  
 timestamp: "2020-05-29T22:11:15.430033249Z"  
 trace: "projects/******/traces/******"  
}

*我真的不知道什么是敏感信息什么不是,所以我只是给一些随机的东西加注星标

在我写这篇文章时,我意识到更多的日志记录会有所帮助,所以我在记录器设置、数据库设置和 table 检查之间插入了一个 Google 记录器我 运行.

函数甚至在 Google 记录器设置之前就崩溃了。

所以在这一点上,我不太确定是什么破坏了我的功能,而且我不知道如何找出答案,因为 Google 云日志记录没有帮助。错误 json 有一个 trace 属性 看起来很有希望,因为此时我需要的只是一个 Python 堆栈跟踪,但我不知道是否有怎么看。

我应该注意到我通过 GCP 的 Cloud Function 控制台配置了环境变量。

原则上,有两件事会有所帮助:

  1. 如何查看 python 云函数崩溃的堆栈跟踪
  2. 什么,特定于我的应用程序,可能会导致它表现出这种崩溃行为

所以我终于想通了,如果不是通过 Cloud Scheduler 触发云函数,而是通过 Test Function

手动 运行 函数

GCP 会给你抛出的异常。就我而言,我的云 MySQL 连接失败

Error: function terminated. Recommended action: inspect logs for termination reason. Details:
(pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '**.***.**.***' (timed out)")
(Background on this error at: http://sqlalche.me/e/e3q8)

所以我只需要引入一个环境变量来在通过代理连接和 unix sockets 之间切换,如下所示:

if os.environ.get("ENV") == "local":
    db = sqlalchemy.create_engine(
        sqlalchemy.engine.url.URL(
            drivername="mysql+pymysql",
            username=os.environ.get("DB_USER"),
            password=os.environ.get("DB_PASS"),
            host=os.environ.get("DB_HOST"),
            port=3306,
            database=PRIMARY_TABLE_NAME
        ),
        pool_size=5,
        max_overflow=2,
        pool_timeout=30,
        pool_recycle=1800
    )
else:
    db = sqlalchemy.create_engine(
        sqlalchemy.engine.url.URL(
            drivername="mysql+pymysql",
            username=os.environ.get("DB_USER"),
            password=os.environ.get("DB_PASS"),
            database=PRIMARY_TABLE_NAME,
            query={"unix_socket": "/cloudsql/{}".format(os.environ.get("CLOUD_SQL_CONNECTION_NAME"))}
        ),
        pool_size=5,
        max_overflow=2,
        pool_timeout=30,
        pool_recycle=1800
    )