云函数在本地运行但在 GCP 中崩溃(函数执行耗时 12681 毫秒,完成状态:'crash')
Cloud function working locally but crashing in GCP (Function execution took 12681 ms, finished with status: 'crash')
这是我的代码的超级缩略版(我会在评论中为所有内容添加 link)
import ...
...
def main(data, context):
log_client = logging.Client()
log_name = 'cloudfunctions.googleapis.com%2Fcloud-functions'
res = Resource(type="cloud_function",
labels={
"function_name": "refresh_classes",
"region": os.environ.get("FUNC_REGION")
})
logger = log_client.logger(log_name.format(os.environ.get("PROJECT_ID")))
db = sqlalchemy.create_engine(
sqlalchemy.engine.url.URL(
drivername="mysql+pymysql",
username=os.environ.get("DB_USER"),
password=os.environ.get("DB_PASS"),
host=os.environ.get("DB_HOST"),
port=3306,
database=PRIMARY_TABLE_NAME
),
pool_size=5,
max_overflow=2,
pool_timeout=30,
pool_recycle=1800
)
start_time = perf_counter()
check_if_table_exists(db)
for i in range(START_IDX, END_IDX):
print(i)
logger.log_text(f"Checking class with id {i}", resource=res, severity="INFO")
...
logger.log_text(f"Total seconds elapsed: {perf_counter() - start_time}", resource=res, severity="INFO")
if __name__ == '__main__':
main('data', 'context')
当我 运行 在本地执行上述云功能时,我的 GOOGLE_APPLICATION_CREDENTIALS
配置以及我的本地云 MySQL 代理设置,详细的云日志记录通过并且功能完成顺利,完全符合我的预期:
然而,当我将整个东西部署到 GCP 并尝试通过控制台触发它(云消息传递触发器)时,我得到的只是日志记录方面的内容:
实际文字
{
insertId: "******"
labels: {
execution_id: "******"
}
logName: "projects/******/logs/cloudfunctions.googleapis.com%2Fcloud-functions"
receiveTimestamp: "2020-05-29T22:11:13.435688367Z"
resource: {
labels: {
function_name: "******"
project_id: "******"
region: "us-central1"
}
type: "cloud_function"
}
severity: "DEBUG"
textPayload: "Function execution started"
timestamp: "2020-05-29T22:11:03.069889708Z"
trace: "projects/******/traces/******"
}
{
insertId: "******"
labels: {
execution_id: "******"
}
logName: "projects/******/logs/cloudfunctions.googleapis.com%2Fcloud-functions"
receiveTimestamp: "2020-05-29T22:11:16.331311285Z"
resource: {
labels: {
function_name: "******"
project_id: "******"
region: "us-central1"
}
type: "cloud_function"
}
severity: "DEBUG"
textPayload: "Function execution took 12362 ms, finished with status: 'crash'"
timestamp: "2020-05-29T22:11:15.430033249Z"
trace: "projects/******/traces/******"
}
*我真的不知道什么是敏感信息什么不是,所以我只是给一些随机的东西加注星标
在我写这篇文章时,我意识到更多的日志记录会有所帮助,所以我在记录器设置、数据库设置和 table 检查之间插入了一个 Google 记录器我 运行.
函数甚至在 Google 记录器设置之前就崩溃了。
所以在这一点上,我不太确定是什么破坏了我的功能,而且我不知道如何找出答案,因为 Google 云日志记录没有帮助。错误 json 有一个 trace
属性 看起来很有希望,因为此时我需要的只是一个 Python 堆栈跟踪,但我不知道是否有怎么看。
我应该注意到我通过 GCP 的 Cloud Function 控制台配置了环境变量。
原则上,有两件事会有所帮助:
- 如何查看 python 云函数崩溃的堆栈跟踪
- 什么,特定于我的应用程序,可能会导致它表现出这种崩溃行为
所以我终于想通了,如果不是通过 Cloud Scheduler 触发云函数,而是通过 Test Function
手动 运行 函数
GCP 会给你抛出的异常。就我而言,我的云 MySQL 连接失败
Error: function terminated. Recommended action: inspect logs for termination reason. Details:
(pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '**.***.**.***' (timed out)")
(Background on this error at: http://sqlalche.me/e/e3q8)
所以我只需要引入一个环境变量来在通过代理连接和 unix sockets 之间切换,如下所示:
if os.environ.get("ENV") == "local":
db = sqlalchemy.create_engine(
sqlalchemy.engine.url.URL(
drivername="mysql+pymysql",
username=os.environ.get("DB_USER"),
password=os.environ.get("DB_PASS"),
host=os.environ.get("DB_HOST"),
port=3306,
database=PRIMARY_TABLE_NAME
),
pool_size=5,
max_overflow=2,
pool_timeout=30,
pool_recycle=1800
)
else:
db = sqlalchemy.create_engine(
sqlalchemy.engine.url.URL(
drivername="mysql+pymysql",
username=os.environ.get("DB_USER"),
password=os.environ.get("DB_PASS"),
database=PRIMARY_TABLE_NAME,
query={"unix_socket": "/cloudsql/{}".format(os.environ.get("CLOUD_SQL_CONNECTION_NAME"))}
),
pool_size=5,
max_overflow=2,
pool_timeout=30,
pool_recycle=1800
)
这是我的代码的超级缩略版(我会在评论中为所有内容添加 link)
import ...
...
def main(data, context):
log_client = logging.Client()
log_name = 'cloudfunctions.googleapis.com%2Fcloud-functions'
res = Resource(type="cloud_function",
labels={
"function_name": "refresh_classes",
"region": os.environ.get("FUNC_REGION")
})
logger = log_client.logger(log_name.format(os.environ.get("PROJECT_ID")))
db = sqlalchemy.create_engine(
sqlalchemy.engine.url.URL(
drivername="mysql+pymysql",
username=os.environ.get("DB_USER"),
password=os.environ.get("DB_PASS"),
host=os.environ.get("DB_HOST"),
port=3306,
database=PRIMARY_TABLE_NAME
),
pool_size=5,
max_overflow=2,
pool_timeout=30,
pool_recycle=1800
)
start_time = perf_counter()
check_if_table_exists(db)
for i in range(START_IDX, END_IDX):
print(i)
logger.log_text(f"Checking class with id {i}", resource=res, severity="INFO")
...
logger.log_text(f"Total seconds elapsed: {perf_counter() - start_time}", resource=res, severity="INFO")
if __name__ == '__main__':
main('data', 'context')
当我 运行 在本地执行上述云功能时,我的 GOOGLE_APPLICATION_CREDENTIALS
配置以及我的本地云 MySQL 代理设置,详细的云日志记录通过并且功能完成顺利,完全符合我的预期:
然而,当我将整个东西部署到 GCP 并尝试通过控制台触发它(云消息传递触发器)时,我得到的只是日志记录方面的内容:
实际文字
{
insertId: "******"
labels: {
execution_id: "******"
}
logName: "projects/******/logs/cloudfunctions.googleapis.com%2Fcloud-functions"
receiveTimestamp: "2020-05-29T22:11:13.435688367Z"
resource: {
labels: {
function_name: "******"
project_id: "******"
region: "us-central1"
}
type: "cloud_function"
}
severity: "DEBUG"
textPayload: "Function execution started"
timestamp: "2020-05-29T22:11:03.069889708Z"
trace: "projects/******/traces/******"
}
{
insertId: "******"
labels: {
execution_id: "******"
}
logName: "projects/******/logs/cloudfunctions.googleapis.com%2Fcloud-functions"
receiveTimestamp: "2020-05-29T22:11:16.331311285Z"
resource: {
labels: {
function_name: "******"
project_id: "******"
region: "us-central1"
}
type: "cloud_function"
}
severity: "DEBUG"
textPayload: "Function execution took 12362 ms, finished with status: 'crash'"
timestamp: "2020-05-29T22:11:15.430033249Z"
trace: "projects/******/traces/******"
}
*我真的不知道什么是敏感信息什么不是,所以我只是给一些随机的东西加注星标
在我写这篇文章时,我意识到更多的日志记录会有所帮助,所以我在记录器设置、数据库设置和 table 检查之间插入了一个 Google 记录器我 运行.
函数甚至在 Google 记录器设置之前就崩溃了。
所以在这一点上,我不太确定是什么破坏了我的功能,而且我不知道如何找出答案,因为 Google 云日志记录没有帮助。错误 json 有一个 trace
属性 看起来很有希望,因为此时我需要的只是一个 Python 堆栈跟踪,但我不知道是否有怎么看。
我应该注意到我通过 GCP 的 Cloud Function 控制台配置了环境变量。
原则上,有两件事会有所帮助:
- 如何查看 python 云函数崩溃的堆栈跟踪
- 什么,特定于我的应用程序,可能会导致它表现出这种崩溃行为
所以我终于想通了,如果不是通过 Cloud Scheduler 触发云函数,而是通过 Test Function
GCP 会给你抛出的异常。就我而言,我的云 MySQL 连接失败
Error: function terminated. Recommended action: inspect logs for termination reason. Details:
(pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '**.***.**.***' (timed out)")
(Background on this error at: http://sqlalche.me/e/e3q8)
所以我只需要引入一个环境变量来在通过代理连接和 unix sockets 之间切换,如下所示:
if os.environ.get("ENV") == "local":
db = sqlalchemy.create_engine(
sqlalchemy.engine.url.URL(
drivername="mysql+pymysql",
username=os.environ.get("DB_USER"),
password=os.environ.get("DB_PASS"),
host=os.environ.get("DB_HOST"),
port=3306,
database=PRIMARY_TABLE_NAME
),
pool_size=5,
max_overflow=2,
pool_timeout=30,
pool_recycle=1800
)
else:
db = sqlalchemy.create_engine(
sqlalchemy.engine.url.URL(
drivername="mysql+pymysql",
username=os.environ.get("DB_USER"),
password=os.environ.get("DB_PASS"),
database=PRIMARY_TABLE_NAME,
query={"unix_socket": "/cloudsql/{}".format(os.environ.get("CLOUD_SQL_CONNECTION_NAME"))}
),
pool_size=5,
max_overflow=2,
pool_timeout=30,
pool_recycle=1800
)