以可扩展的方式使用 amazon neptune 的推荐架构是什么?

What is the recommended architecture for using amazon neptune in a scalable way?

我正在构建一个由 Neptune 数据库支持的应用程序。因为我希望应用程序具有可扩展性,所以我使用 AWS Lambda + API 网关构建一个 REST API 来与数据库交互。基于此用例记录在 Neptune docs.

中这一事实,这似乎是一个合理的想法

Neptune 文档建议在函数的整个执行上下文中重用与数据库的 websocket 连接,这正是我目前正在做的。文档还建议重置连接并重试错误(请参阅 here),我也在使用它。但是,我时不时地看到异常(平均可能每 20 个请求出现一次)。我得到的例外之一是

ConnectionResetError: Cannot write to closing transport

这似乎与 this issue 相同。

另一个是:

Traceback (most recent call last):
  File "/var/task/chalice/app.py", line 1685, in _get_view_function_response
    response = view_function(**function_args)
  File "/var/task/app.py", line 57, in resource
    return Resource(app.current_request, g).process()
  File "/var/task/backoff/_sync.py", line 94, in retry
    ret = target(*args, **kwargs)
  File "/var/task/chalicelib/handlers/resource.py", line 106, in get
    values = resources.valueMap().with_(WithOptions.tokens).toList()
  File "/var/task/gremlin_python/process/traversal.py", line 57, in toList
    return list(iter(self))
  File "/var/task/gremlin_python/process/traversal.py", line 47, in __next__
    self.traversal_strategies.apply_strategies(self)
  File "/var/task/gremlin_python/process/traversal.py", line 548, in apply_strategies
    traversal_strategy.apply(traversal)
  File "/var/task/gremlin_python/driver/remote_connection.py", line 63, in apply
    remote_traversal = self.remote_connection.submit(traversal.bytecode)
  File "/var/task/gremlin_python/driver/driver_remote_connection.py", line 60, in submit
    results = result_set.all().result()
  File "/var/lang/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/var/lang/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/var/task/gremlin_python/driver/resultset.py", line 90, in cb
    f.result()
  File "/var/lang/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/var/lang/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/var/lang/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/var/task/gremlin_python/driver/connection.py", line 82, in _receive
    data = self._transport.read()
  File "/var/task/gremlin_python/driver/aiohttp/transport.py", line 104, in read
    raise RuntimeError("Connection was already closed.")
RuntimeError: Connection was already closed.

如果相关,我使用的是 gremlingpython==3.5.1

在我看来,这些问题最终都是使用 AWS Lambda 的结果,即由于 websocket 连接的寿命与 lambda 执行上下文的短暂性质之间的不匹配。接下来的问题是:我尝试为我的 API 使用 AWS lambda 是不是做错了什么?设置 EC2 实例并以其他方式处理可扩展性是否更合适?

P.S。以前我确实在每个函数执行时创建和关闭一个连接(正如之前在 Neptune 文档中推荐的那样),它工作正常但自然很慢。

最新版本的 Neptune 仅支持 Gremlin 3.4.11 (https://docs.aws.amazon.com/neptune/latest/userguide/engine-releases-1.0.5.1.html). I would start by using gremlin-python 3.4.11 and see if that resolves your issue. Gremlin-python 3.5 replaced Tornado with AIO HTTP (ref) 进行 websocket 连接,我怀疑这一变化可能会导致行为发生轻微变化,支持 Gremlin 3.5 的未来版本将解决此问题。

我想知道 'Connection was already closed' 错误消息是否未被重试逻辑视为可重试错误?

如果将此错误消息添加到文档 Python 示例中的 retriable_error_msgs 列表中会发生什么情况?