如何在 AWS Glue 中导入 RateLimiter Python
How to import RateLimiter in AWS Glue Python
我想为从我的 python 脚本粘合作业到 DDB 的调用添加一个速率限制器,并减轻其调用量峰值。我实现了类似下面的东西,就像 https://pypi.org/project/ratelimiter/ 中建议的那样:
from ratelimiter import RateLimiter
rate_limiter = RateLimiter(max_calls=10, period=1)
for i in range(100):
with rate_limiter:
do_something()
但出现以下异常:
rmation.doAs(UserGroupInformation.java:1844) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: org.apache.spark.SparkUserAppException: User application exited with 1 at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:106) at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:684) 21/02/10 03:05:08 INFO ApplicationMaster: Deleting staging directory hdfs://20.0.18.119:8020/user/root/.sparkStaging/application_1612925905975_0002 21/02/10 03:05:08 INFO ShutdownHookManager: Shutdown hook called End of LogType:stderr LogType:stdout Log Upload Time:Wed Feb 10 03:05:10 +0000 2021 LogLength:253 Log Contents: Parse yarn logs get error message: ModuleNotFoundError: No module named 'ratelimiter' Traceback (most recent call last): File "script_2021-02-10-03-04-33.py", line 10, in from ratelimiter import RateLimiter ModuleNotFoundError: No module named 'ratelimiter' End of LogType:stdout
如何导入速率限制器?
谢谢!
根据评论。
ratelimiter
不是标准的 python 库。因此,默认情况下
它在 Glue 作业中不可用。然而,我们可以
将外部库 添加到作业中,如以下所述:
添加外部库的过程涉及三个步骤:
使用库创建 .zip 文件(除非库包含在单个 .py 文件中)。
将 zip 上传到 S3。
在工作或工作中使用库 运行。
我想为从我的 python 脚本粘合作业到 DDB 的调用添加一个速率限制器,并减轻其调用量峰值。我实现了类似下面的东西,就像 https://pypi.org/project/ratelimiter/ 中建议的那样:
from ratelimiter import RateLimiter
rate_limiter = RateLimiter(max_calls=10, period=1)
for i in range(100):
with rate_limiter:
do_something()
但出现以下异常:
rmation.doAs(UserGroupInformation.java:1844) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: org.apache.spark.SparkUserAppException: User application exited with 1 at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:106) at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:684) 21/02/10 03:05:08 INFO ApplicationMaster: Deleting staging directory hdfs://20.0.18.119:8020/user/root/.sparkStaging/application_1612925905975_0002 21/02/10 03:05:08 INFO ShutdownHookManager: Shutdown hook called End of LogType:stderr LogType:stdout Log Upload Time:Wed Feb 10 03:05:10 +0000 2021 LogLength:253 Log Contents: Parse yarn logs get error message: ModuleNotFoundError: No module named 'ratelimiter' Traceback (most recent call last): File "script_2021-02-10-03-04-33.py", line 10, in from ratelimiter import RateLimiter ModuleNotFoundError: No module named 'ratelimiter' End of LogType:stdout
如何导入速率限制器?
谢谢!
根据评论。
ratelimiter
不是标准的 python 库。因此,默认情况下
它在 Glue 作业中不可用。然而,我们可以
将外部库 添加到作业中,如以下所述:
添加外部库的过程涉及三个步骤:
使用库创建 .zip 文件(除非库包含在单个 .py 文件中)。
将 zip 上传到 S3。
在工作或工作中使用库 运行。