使用 s3cmd 时有没有办法指定最大重试次数?

Is there any way to specify a max number of retries when using s3cmd?

我已经查看了 usage guide as well as the config docs 但我没有看到它。这是我的 bash 脚本的输出,该脚本在 S3 出现故障时使用 s3cmd sync

WARNING: Retrying failed request: /some/bucket/path/
WARNING: 503 (Service Unavailable): 
WARNING: Waiting 3 sec...
WARNING: Retrying failed request: /some/bucket/path/
WARNING: 503 (Service Unavailable): 
WARNING: Waiting 6 sec...
ERROR: The read operation timed out

看起来它正在使用指数退避重试两次,然后失败了。当然必须有一些方法来明确说明 s3cmd 应该重试失败的网络调用多少次?

503 不太可能是因为 S3 已关闭,它几乎从来没有 'down'。您的帐户更有可能已被限制,因为您在太短的时间内发出了太多请求。

如果你控制速度,你应该放慢你的请求,或者我会建议选择更好的密钥,即不以相同前缀开头的密钥 - 范围广泛的密钥将允许 s3更好地分散工作量。

来自 Jeff Barr 的博客 post:

Further, keys in S3 are partitioned by prefix.

As we said, S3 has automation that continually looks for areas of the keyspace that need splitting. Partitions are split either due to sustained high request rates, or because they contain a large number of keys (which would slow down lookups within the partition). There is overhead in moving keys into newly created partitions, but with request rates low and no special tricks, we can keep performance reasonably high even during partition split operations. This split operation happens dozens of times a day all over S3 and simply goes unnoticed from a user performance perspective. However, when request rates significantly increase on a single partition, partition splits become detrimental to request performance. How, then, do these heavier workloads work over time? Smart naming of the keys themselves!

We frequently see new workloads introduced to S3 where content is organized by user ID, or game ID, or other similar semi-meaningless identifier. Often these identifiers are incrementally increasing numbers, or date-time constructs of various types. The unfortunate part of this naming choice where S3 scaling is concerned is two-fold: First, all new content will necessarily end up being owned by a single partition (remember the request rates from above…). Second, all the partitions holding slightly older (and generally less ‘hot’) content get cold much faster than other naming conventions, effectively wasting the available operations per second that each partition can support by making all the old ones cold over time.

The simplest trick that makes these schemes work well in S3 at nearly any request rate is to simply reverse the order of the digits in this identifier (use seconds of precision for date or time-based identifiers). These identifiers then effectively start with a random number – and a few of them at that – which then fans out the transactions across many potential child partitions. Each of those child partitions scales close enough to linearly (even with some content being hotter or colder) that no meaningful operations per second budget is wasted either. In fact, S3 even has an algorithm to detect this parallel type of write pattern and will automatically create multiple child partitions from the same parent simultaneously – increasing the system’s operations per second budget as request heat is detected.

https://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-tricks-seattle-hiring-event/

我认为您无法设置最大重试次数。我在 GitHub (https://github.com/s3tools/s3cmd/blob/master/S3/S3.py).

上查看了它的源代码

看起来该值是 5 并且是硬编码的:

第 240 行:

## Maximum attempts of re-issuing failed requests
_max_retries = 5

重试间隔计算为:

第 1004 行:

def _fail_wait(self, retries):
    # Wait a few seconds. The more it fails the more we wait.
    return (self._max_retries - retries + 1) * 3    

以及执行重试的实际代码:

if response["status"] >= 500:
        e = S3Error(response)

        if response["status"] == 501:
            ## NotImplemented server error - no need to retry
            retries = 0

        if retries:
            warning(u"Retrying failed request: %s" % resource['uri'])
            warning(unicode(e))
            warning("Waiting %d sec..." % self._fail_wait(retries))
            time.sleep(self._fail_wait(retries))
            return self.send_request(request, retries - 1)
        else:
            raise e

所以我认为在第二次尝试后发生了一些其他错误并导致它退出重试循环。