如何在 AWS Glue 中导入 RateLimiter Python

How to import RateLimiter in AWS Glue Python

我想为从我的 python 脚本粘合作业到 DDB 的调用添加一个速率限制器,并减轻其调用量峰值。我实现了类似下面的东西,就像 https://pypi.org/project/ratelimiter/ 中建议的那样:

from ratelimiter import RateLimiter

rate_limiter = RateLimiter(max_calls=10, period=1)

for i in range(100):
    with rate_limiter:
        do_something()

但出现以下异常:

rmation.doAs(UserGroupInformation.java:1844) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: org.apache.spark.SparkUserAppException: User application exited with 1 at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:106) at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:684) 21/02/10 03:05:08 INFO ApplicationMaster: Deleting staging directory hdfs://20.0.18.119:8020/user/root/.sparkStaging/application_1612925905975_0002 21/02/10 03:05:08 INFO ShutdownHookManager: Shutdown hook called End of LogType:stderr LogType:stdout Log Upload Time:Wed Feb 10 03:05:10 +0000 2021 LogLength:253 Log Contents: Parse yarn logs get error message: ModuleNotFoundError: No module named 'ratelimiter' Traceback (most recent call last): File "script_2021-02-10-03-04-33.py", line 10, in from ratelimiter import RateLimiter ModuleNotFoundError: No module named 'ratelimiter' End of LogType:stdout

如何导入速率限制器?

谢谢!

根据评论。

ratelimiter 不是标准的 python 库。因此,默认情况下 它在 Glue 作业中不可用。然而,我们可以 将外部库 添加到作业中,如以下所述:

添加外部库的过程涉及三个步骤:

  1. 使用库创建 .zip 文件(除非库包含在单个 .py 文件中)。

  2. 将 zip 上传到 S3。

  3. 在工作或工作中使用库 运行。