Scrapy 中的 Boto 错误:"The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256."
Boto error in Scrapy: "The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256."
我正在尝试抓取以下蜘蛛:
import scrapy
from tutorial.items import QuoteItem
class QuotesSpider(scrapy.Spider):
name = "quotes"
custom_settings = {
'FEED_URI': 's3://apkmirror/quotes.json',
'AWS_ACCESS_KEY_ID': 'foo',
'AWS_SECRET_ACCESS_KEY': 'bar',
}
def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
for quote in response.css('div.quote'):
item = QuoteItem()
item['text'] = quote.css('span.text::text').extract_first()
item['author'] = quote.css('small.author::text').extract_first()
item['tags'] = quote.css('div.tags a.tag::text').extract()
yield item
其中 'foo'
和 'bar'
分别是位于法兰克福的 Amazon S3 存储桶的 AWS 访问密钥 ID 和密钥,items.py
只是
import scrapy
class QuoteItem(scrapy.Item):
text = scrapy.Field()
author = scrapy.Field()
tags = scrapy.Field()
但是,当我尝试 scrapy crawl quotes
时,日志包含以下错误消息:
2017-05-15 18:33:56 [scrapy.core.engine] INFO: Closing spider (finished)
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_ascii_metadata at 0x7fd56fd3b488>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function sse_md5 at 0x7fd56fd38b18>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function convert_body_to_file_like_object at 0x7fd56fd3ba28>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_bucket_name at 0x7fd56fd38aa0>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7fd56ec7bad0>>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function conditionally_calculate_md5 at 0x7fd56fd38a28>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function add_expect_header at 0x7fd56fd38ed8>
2017-05-15 18:33:56 [botocore.handlers] DEBUG: Adding expect 100 continue header to request.
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7fd56ec7bad0>>
2017-05-15 18:33:56 [botocore.endpoint] DEBUG: Making request for OperationModel(name=PutObject) (verify_ssl=True) with params: {'body': <open file '<fdopen>', mode 'w+b' at 0x7fd56ef29810>, 'url': u'https://s3.amazonaws.com/apkmirror/quotes.json', 'headers': {'Content-MD5': u'U+PeT0soEYWoCF4DMQXEzA==', 'Expect': '100-continue', 'User-Agent': 'Botocore/1.4.67 Python/2.7.12 Linux/4.4.0-75-generic'}, 'context': {'client_region': u'us-east-1', 'signing': {'bucket': 'apkmirror'}, 'has_streaming_input': True, 'client_config': <botocore.config.Config object at 0x7fd56ec7b610>}, 'query_string': {}, 'url_path': u'/apkmirror/quotes.json', 'method': u'PUT'}
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event request-created.s3.PutObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7fd56ec7b510>>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-sign.s3.PutObject: calling handler <function fix_s3_host at 0x7fd56fe285f0>
2017-05-15 18:33:56 [botocore.utils] DEBUG: Checking for DNS compatible bucket for: https://s3.amazonaws.com/apkmirror/quotes.json
2017-05-15 18:33:56 [botocore.utils] DEBUG: URI updated to: https://apkmirror.s3.amazonaws.com/quotes.json
2017-05-15 18:33:56 [botocore.auth] DEBUG: Calculating signature using hmacv1 auth.
2017-05-15 18:33:56 [botocore.auth] DEBUG: HTTP request method: PUT
2017-05-15 18:33:56 [botocore.auth] DEBUG: StringToSign:
PUT
U+PeT0soEYWoCF4DMQXEzA==
Mon, 15 May 2017 16:33:56 GMT
/apkmirror/quotes.json
2017-05-15 18:33:56 [botocore.endpoint] DEBUG: Sending http request: <PreparedRequest [PUT]>
2017-05-15 18:33:56 [botocore.vendored.requests.packages.urllib3.connectionpool] INFO: Starting new HTTPS connection (1): apkmirror.s3.amazonaws.com
2017-05-15 18:33:56 [botocore.awsrequest] DEBUG: Waiting for 100 Continue response.
2017-05-15 18:33:56 [botocore.awsrequest] DEBUG: Received a non 100 Continue response from the server, NOT sending request body.
2017-05-15 18:33:56 [botocore.vendored.requests.packages.urllib3.connectionpool] DEBUG: "PUT /quotes.json HTTP/1.1" 400 None
2017-05-15 18:33:56 [botocore.parsers] DEBUG: Response headers: {'x-amz-region': 'eu-central-1', 'x-amz-id-2': 'ti0jteHsbwyFinnUnoVAz5xywBgGBnRnIq+HlEZyZ4YDZ83yagh8tEttuelsB+UFmA+ssOO3iFk=', 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'close', 'x-amz-request-id': '276FC0F60406C7C5', 'date': 'Mon, 15 May 2017 16:33:55 GMT', 'content-type': 'application/xml'}
2017-05-15 18:33:56 [botocore.parsers] DEBUG: Response body:
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidRequest</Code><Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message><RequestId>276FC0F60406C7C5</RequestId><HostId>ti0jteHsbwyFinnUnoVAz5xywBgGBnRnIq+HlEZyZ4YDZ83yagh8tEttuelsB+UFmA+ssOO3iFk=</HostId></Error>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7fd56ece8290>
2017-05-15 18:33:56 [botocore.retryhandler] DEBUG: No retry needed.
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7fd56ec7bad0>>
2017-05-15 18:33:56 [scrapy.extensions.feedexport] ERROR: Error storing jsonlines feed (20 items) in: s3://apkmirror/quotes.json
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 250, in inContext
result = inContext.theWork()
File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 122, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 85, in callWithContext
return func(*args,**kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/extensions/feedexport.py", line 118, in _store_in_thread
Bucket=self.bucketname, Key=self.keyname, Body=file)
File "/usr/local/lib/python2.7/dist-packages/botocore/client.py", line 251, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python2.7/dist-packages/botocore/client.py", line 537, in _make_api_call
raise ClientError(parsed_response, operation_name)
ClientError: An error occurred (InvalidRequest) when calling the PutObject operation: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.
从 The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256 and Using boto for AWS S3 Buckets for Signature V4 看来,问题与位于法兰克福的 S3 存储桶密切相关(没有双关语意)。一种解决方案涉及更改 boto 的 connect_to_region
.
中的 host
参数
然而,在我的例子中,boto
的使用由 scrapy
源代码处理,我不想接触它。我该如何解决这个问题?
One solution involves changing the host argument in boto's connect_to_region.
导出到 S3 的存储后端由 scrapy.extensions.feedexport.S3FeedStorage
处理
您可以子class S3FeedStorage
class 并实现您自己的一个,这解决了不匹配的 S3 存储桶身份验证机制的问题。
您还需要添加
{
"s3": "myproject.extentions.MyS3FeedStorage",
}
进入FEED_STORAGES
设置让Scrapy使用它。
另见 document
所以这是 scrapy 中的未解决问题 (here). You can work around this by using the aws shared configuration file to set the signature version to s3v4. You can see all the s3 config docs here.
要仅设置 sigv4,您可以创建包含以下内容的文件 ~/.aws/config
:
[default]
s3 =
signature_version = s3v4
或者如果您已经安装了 aws cli,您可以 运行:
aws configure set default.s3.signature_version s3v4
为了完整起见,这里是我对答案的实现。最后我发现修改我的 AWS 配置更容易(Jordan Phillips) rather than to subclass S3FeedStorage
(as recommended by starrify 推荐)。我使用了以下 Dockerfile
到 运行 爬虫:
# Adapted from trcook/docker-scrapy
FROM python:alpine
RUN apk --update add libxml2-dev libxslt-dev libffi-dev gcc musl-dev libgcc openssl-dev
RUN pip install scrapy botocore awscli
RUN aws configure set aws_access_key_id foo
RUN aws configure set aws_secret_access_key bar
RUN aws configure set default.region eu-central-1
RUN aws configure set default.s3.signature_version s3v4
COPY . /scraper
WORKDIR /scraper
CMD ["scrapy", "crawl", "quotes"]
其中 foo
和 bar
分别是实际的 AWS 访问密钥 ID 和 AWS 秘密访问密钥。如果我 docker build --tag quotes .
后跟 docker run quotes
,则爬虫 运行s 没有错误:
2017-05-16 13:03:37 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: tutorial)
2017-05-16 13:03:37 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'tutorial', 'NEWSPIDER_MODULE': 'tutorial.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['tutorial.spiders']}
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: env
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: assume-role
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: shared-credentials-file
2017-05-16 13:03:37 [botocore.credentials] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/endpoints.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/s3/2006-03-01/service-2.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/_retry.json
2017-05-16 13:03:37 [botocore.client] DEBUG: Registering retry handlers for service: s3
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f8c2f2f6a60>
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f8c2f2f6840>
2017-05-16 13:03:37 [botocore.client] DEBUG: Switching signature version for service s3 to version s3v4 based on config file override.
2017-05-16 13:03:37 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2017-05-16 13:03:37 [botocore.client] DEBUG: Defaulting to S3 virtual host style addressing with path style addressing fallback.
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-05-16 13:03:37 [scrapy.core.engine] INFO: Spider opened
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: env
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: assume-role
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: shared-credentials-file
2017-05-16 13:03:37 [botocore.credentials] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/endpoints.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/s3/2006-03-01/service-2.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/_retry.json
2017-05-16 13:03:37 [botocore.client] DEBUG: Registering retry handlers for service: s3
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f8c2f2f6a60>
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f8c2f2f6840>
2017-05-16 13:03:37 [botocore.client] DEBUG: Switching signature version for service s3 to version s3v4 based on config file override.
2017-05-16 13:03:37 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2017-05-16 13:03:37 [botocore.client] DEBUG: Defaulting to S3 virtual host style addressing with path style addressing fallback.
2017-05-16 13:03:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-05-16 13:03:37 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-05-16 13:03:38 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2017-05-16 13:03:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2017-05-16 13:03:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: None)
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Albert Einstein',
'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
'text': '“The world as we have created it is a process of our thinking. It '
'cannot be changed without changing our thinking.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'J.K. Rowling',
'tags': ['abilities', 'choices'],
'text': '“It is our choices, Harry, that show what we truly are, far more '
'than our abilities.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Albert Einstein',
'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'],
'text': '“There are only two ways to live your life. One is as though nothing '
'is a miracle. The other is as though everything is a miracle.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Jane Austen',
'tags': ['aliteracy', 'books', 'classic', 'humor'],
'text': '“The person, be it gentleman or lady, who has not pleasure in a good '
'novel, must be intolerably stupid.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Marilyn Monroe',
'tags': ['be-yourself', 'inspirational'],
'text': "“Imperfection is beauty, madness is genius and it's better to be "
'absolutely ridiculous than absolutely boring.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Albert Einstein',
'tags': ['adulthood', 'success', 'value'],
'text': '“Try not to become a man of success. Rather become a man of value.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'André Gide',
'tags': ['life', 'love'],
'text': '“It is better to be hated for what you are than to be loved for what '
'you are not.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Thomas A. Edison',
'tags': ['edison', 'failure', 'inspirational', 'paraphrased'],
'text': "“I have not failed. I've just found 10,000 ways that won't work.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Eleanor Roosevelt',
'tags': ['misattributed-eleanor-roosevelt'],
'text': '“A woman is like a tea bag; you never know how strong it is until '
"it's in hot water.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Steve Martin',
'tags': ['humor', 'obvious', 'simile'],
'text': '“A day without sunshine is like, you know, night.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Marilyn Monroe',
'tags': ['friends', 'heartbreak', 'inspirational', 'life', 'love', 'sisters'],
'text': "“This life is what you make it. No matter what, you're going to mess "
"up sometimes, it's a universal truth. But the good part is you get "
"to decide how you're going to mess it up. Girls will be your friends "
"- they'll act like it anyway. But just remember, some come, some go. "
"The ones that stay with you through everything - they're your true "
"best friends. Don't let go of them. Also remember, sisters make the "
"best friends in the world. As for lovers, well, they'll come and go "
'too. And baby, I hate to say it, most of them - actually pretty much '
"all of them are going to break your heart, but you can't give up "
"because if you give up, you'll never find your soulmate. You'll "
'never find that half who makes you whole and that goes for '
"everything. Just because you fail once, doesn't mean you're gonna "
'fail at everything. Keep trying, hold on, and always, always, always '
"believe in yourself, because if you don't, then who will, sweetie? "
'So keep your head high, keep your chin up, and most importantly, '
"keep smiling, because life's a beautiful thing and there's so much "
'to smile about.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'J.K. Rowling',
'tags': ['courage', 'friends'],
'text': '“It takes a great deal of bravery to stand up to our enemies, but '
'just as much to stand up to our friends.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Albert Einstein',
'tags': ['simplicity', 'understand'],
'text': "“If you can't explain it to a six year old, you don't understand it "
'yourself.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Bob Marley',
'tags': ['love'],
'text': '“You may not be her first, her last, or her only. She loved before '
'she may love again. But if she loves you now, what else matters? '
"She's not perfect—you aren't either, and the two of you may never be "
'perfect together but if she can make you laugh, cause you to think '
'twice, and admit to being human and making mistakes, hold onto her '
'and give her the most you can. She may not be thinking about you '
'every second of the day, but she will give you a part of her that '
"she knows you can break—her heart. So don't hurt her, don't change "
"her, don't analyze and don't expect more than she can give. Smile "
'when she makes you happy, let her know when she makes you mad, and '
"miss her when she's not there.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Dr. Seuss',
'tags': ['fantasy'],
'text': '“I like nonsense, it wakes up the brain cells. Fantasy is a '
'necessary ingredient in living.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Douglas Adams',
'tags': ['life', 'navigation'],
'text': '“I may not have gone where I intended to go, but I think I have '
'ended up where I needed to be.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Elie Wiesel',
'tags': ['activism',
'apathy',
'hate',
'indifference',
'inspirational',
'love',
'opposite',
'philosophy'],
'text': "“The opposite of love is not hate, it's indifference. The opposite "
"of art is not ugliness, it's indifference. The opposite of faith is "
"not heresy, it's indifference. And the opposite of life is not "
"death, it's indifference.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Friedrich Nietzsche',
'tags': ['friendship',
'lack-of-friendship',
'lack-of-love',
'love',
'marriage',
'unhappy-marriage'],
'text': '“It is not a lack of love, but a lack of friendship that makes '
'unhappy marriages.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Mark Twain',
'tags': ['books', 'contentment', 'friends', 'friendship', 'life'],
'text': '“Good friends, good books, and a sleepy conscience: this is the '
'ideal life.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Allen Saunders',
'tags': ['fate', 'life', 'misattributed-john-lennon', 'planning', 'plans'],
'text': '“Life is what happens to us while we are making other plans.”'}
2017-05-16 13:03:38 [scrapy.core.engine] INFO: Closing spider (finished)
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_ascii_metadata at 0x7f8c2f2b0ae8>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function sse_md5 at 0x7f8c2f2acea0>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function convert_body_to_file_like_object at 0x7f8c2f2b1268>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_bucket_name at 0x7f8c2f2ace18>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7f8c2e7a7780>>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function generate_idempotent_uuid at 0x7f8c2f2aca60>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function conditionally_calculate_md5 at 0x7f8c2f2acd90>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function add_expect_header at 0x7f8c2f2b0378>
2017-05-16 13:03:38 [botocore.handlers] DEBUG: Adding expect 100 continue header to request.
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7f8c2e7a7780>>
2017-05-16 13:03:38 [botocore.endpoint] DEBUG: Making request for OperationModel(name=PutObject) (verify_ssl=True) with params: {'url_path': '/apkmirror/quotes3.json', 'query_string': {}, 'method': 'PUT', 'headers': {'User-Agent': 'Botocore/1.5.49 Python/3.6.1 Linux/4.4.0-75-generic', 'Content-MD5': 'U+PeT0soEYWoCF4DMQXEzA==', 'Expect': '100-continue'}, 'body': <tempfile._TemporaryFileWrapper object at 0x7f8c2f22e2b0>, 'url': 'https://s3.eu-central-1.amazonaws.com/apkmirror/quotes3.json', 'context': {'client_region': 'eu-central-1', 'client_config': <botocore.config.Config object at 0x7f8c2e7a7438>, 'has_streaming_input': True, 'auth_type': None, 'signing': {'bucket': 'apkmirror'}}}
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event request-created.s3.PutObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f8c2e7a73c8>>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event choose-signer.s3.PutObject: calling handler <function set_operation_specific_signer at 0x7f8c2f2ac950>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-sign.s3.PutObject: calling handler <function fix_s3_host at 0x7f8c2f42dd08>
2017-05-16 13:03:38 [botocore.auth] DEBUG: Calculating signature using v4 auth.
2017-05-16 13:03:38 [botocore.auth] DEBUG: CanonicalRequest:
PUT
/apkmirror/quotes3.json
content-md5:U+PeT0soEYWoCF4DMQXEzA==
host:s3.eu-central-1.amazonaws.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20170516T130338Z
content-md5;host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
2017-05-16 13:03:38 [botocore.auth] DEBUG: StringToSign:
AWS4-HMAC-SHA256
20170516T130338Z
20170516/eu-central-1/s3/aws4_request
929e3a39776d42c15c4c7c197c718f67b6105341ed4a269365c6e6ed88378a69
2017-05-16 13:03:38 [botocore.auth] DEBUG: Signature:
81a1c8014fa22d52d371a8aea10d47e0f32e8913dcc18b2f1210c7ce458311e4
2017-05-16 13:03:38 [botocore.endpoint] DEBUG: Sending http request: <PreparedRequest [PUT]>
2017-05-16 13:03:38 [botocore.vendored.requests.packages.urllib3.connectionpool] INFO: Starting new HTTPS connection (1): s3.eu-central-1.amazonaws.com
2017-05-16 13:03:38 [botocore.awsrequest] DEBUG: Waiting for 100 Continue response.
2017-05-16 13:03:38 [botocore.awsrequest] DEBUG: 100 Continue response seen, now sending request body.
2017-05-16 13:03:38 [botocore.vendored.requests.packages.urllib3.connectionpool] DEBUG: "PUT /apkmirror/quotes3.json HTTP/1.1" 200 0
2017-05-16 13:03:38 [botocore.parsers] DEBUG: Response headers: {'x-amz-id-2': 'WB/HgvEGKd7ysqcRa1vodr2znuevKA+fTTX/2elIAcID05t7Ex2G7UTM+rl/AhvIPeB+0gL4YaY=', 'x-amz-request-id': '9C449953B48DA63F', 'date': 'Tue, 16 May 2017 13:03:39 GMT', 'etag': '"53e3de4f4b281185a8085e033105c4cc"', 'content-length': '0', 'server': 'AmazonS3'}
2017-05-16 13:03:38 [botocore.parsers] DEBUG: Response body:
b''
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f8c2e774e10>
2017-05-16 13:03:38 [botocore.retryhandler] DEBUG: No retry needed.
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f8c2e7a7780>>
2017-05-16 13:03:38 [scrapy.extensions.feedexport] INFO: Stored jsonlines feed (20 items) in: s3://apkmirror/quotes3.json
2017-05-16 13:03:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 675,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'downloader/response_bytes': 5976,
'downloader/response_count': 3,
'downloader/response_status_count/200': 2,
'downloader/response_status_count/404': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 5, 16, 13, 3, 38, 317079),
'item_scraped_count': 20,
'log_count/DEBUG': 75,
'log_count/INFO': 11,
'response_received_count': 3,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2017, 5, 16, 13, 3, 37, 897491)}
2017-05-16 13:03:38 [scrapy.core.engine] INFO: Spider closed (finished)
此外,在我的蜘蛛中,我不再需要实施 AWS_ACCESS_KEY_ID
和 AWS_SECRET_ACCESS_KEY
设置,因为这些是来自配置文件的 'picked up'。
我正在尝试抓取以下蜘蛛:
import scrapy
from tutorial.items import QuoteItem
class QuotesSpider(scrapy.Spider):
name = "quotes"
custom_settings = {
'FEED_URI': 's3://apkmirror/quotes.json',
'AWS_ACCESS_KEY_ID': 'foo',
'AWS_SECRET_ACCESS_KEY': 'bar',
}
def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
for quote in response.css('div.quote'):
item = QuoteItem()
item['text'] = quote.css('span.text::text').extract_first()
item['author'] = quote.css('small.author::text').extract_first()
item['tags'] = quote.css('div.tags a.tag::text').extract()
yield item
其中 'foo'
和 'bar'
分别是位于法兰克福的 Amazon S3 存储桶的 AWS 访问密钥 ID 和密钥,items.py
只是
import scrapy
class QuoteItem(scrapy.Item):
text = scrapy.Field()
author = scrapy.Field()
tags = scrapy.Field()
但是,当我尝试 scrapy crawl quotes
时,日志包含以下错误消息:
2017-05-15 18:33:56 [scrapy.core.engine] INFO: Closing spider (finished)
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_ascii_metadata at 0x7fd56fd3b488>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function sse_md5 at 0x7fd56fd38b18>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function convert_body_to_file_like_object at 0x7fd56fd3ba28>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_bucket_name at 0x7fd56fd38aa0>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7fd56ec7bad0>>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function conditionally_calculate_md5 at 0x7fd56fd38a28>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function add_expect_header at 0x7fd56fd38ed8>
2017-05-15 18:33:56 [botocore.handlers] DEBUG: Adding expect 100 continue header to request.
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7fd56ec7bad0>>
2017-05-15 18:33:56 [botocore.endpoint] DEBUG: Making request for OperationModel(name=PutObject) (verify_ssl=True) with params: {'body': <open file '<fdopen>', mode 'w+b' at 0x7fd56ef29810>, 'url': u'https://s3.amazonaws.com/apkmirror/quotes.json', 'headers': {'Content-MD5': u'U+PeT0soEYWoCF4DMQXEzA==', 'Expect': '100-continue', 'User-Agent': 'Botocore/1.4.67 Python/2.7.12 Linux/4.4.0-75-generic'}, 'context': {'client_region': u'us-east-1', 'signing': {'bucket': 'apkmirror'}, 'has_streaming_input': True, 'client_config': <botocore.config.Config object at 0x7fd56ec7b610>}, 'query_string': {}, 'url_path': u'/apkmirror/quotes.json', 'method': u'PUT'}
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event request-created.s3.PutObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7fd56ec7b510>>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-sign.s3.PutObject: calling handler <function fix_s3_host at 0x7fd56fe285f0>
2017-05-15 18:33:56 [botocore.utils] DEBUG: Checking for DNS compatible bucket for: https://s3.amazonaws.com/apkmirror/quotes.json
2017-05-15 18:33:56 [botocore.utils] DEBUG: URI updated to: https://apkmirror.s3.amazonaws.com/quotes.json
2017-05-15 18:33:56 [botocore.auth] DEBUG: Calculating signature using hmacv1 auth.
2017-05-15 18:33:56 [botocore.auth] DEBUG: HTTP request method: PUT
2017-05-15 18:33:56 [botocore.auth] DEBUG: StringToSign:
PUT
U+PeT0soEYWoCF4DMQXEzA==
Mon, 15 May 2017 16:33:56 GMT
/apkmirror/quotes.json
2017-05-15 18:33:56 [botocore.endpoint] DEBUG: Sending http request: <PreparedRequest [PUT]>
2017-05-15 18:33:56 [botocore.vendored.requests.packages.urllib3.connectionpool] INFO: Starting new HTTPS connection (1): apkmirror.s3.amazonaws.com
2017-05-15 18:33:56 [botocore.awsrequest] DEBUG: Waiting for 100 Continue response.
2017-05-15 18:33:56 [botocore.awsrequest] DEBUG: Received a non 100 Continue response from the server, NOT sending request body.
2017-05-15 18:33:56 [botocore.vendored.requests.packages.urllib3.connectionpool] DEBUG: "PUT /quotes.json HTTP/1.1" 400 None
2017-05-15 18:33:56 [botocore.parsers] DEBUG: Response headers: {'x-amz-region': 'eu-central-1', 'x-amz-id-2': 'ti0jteHsbwyFinnUnoVAz5xywBgGBnRnIq+HlEZyZ4YDZ83yagh8tEttuelsB+UFmA+ssOO3iFk=', 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'close', 'x-amz-request-id': '276FC0F60406C7C5', 'date': 'Mon, 15 May 2017 16:33:55 GMT', 'content-type': 'application/xml'}
2017-05-15 18:33:56 [botocore.parsers] DEBUG: Response body:
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidRequest</Code><Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message><RequestId>276FC0F60406C7C5</RequestId><HostId>ti0jteHsbwyFinnUnoVAz5xywBgGBnRnIq+HlEZyZ4YDZ83yagh8tEttuelsB+UFmA+ssOO3iFk=</HostId></Error>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7fd56ece8290>
2017-05-15 18:33:56 [botocore.retryhandler] DEBUG: No retry needed.
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7fd56ec7bad0>>
2017-05-15 18:33:56 [scrapy.extensions.feedexport] ERROR: Error storing jsonlines feed (20 items) in: s3://apkmirror/quotes.json
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 250, in inContext
result = inContext.theWork()
File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 122, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 85, in callWithContext
return func(*args,**kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/extensions/feedexport.py", line 118, in _store_in_thread
Bucket=self.bucketname, Key=self.keyname, Body=file)
File "/usr/local/lib/python2.7/dist-packages/botocore/client.py", line 251, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python2.7/dist-packages/botocore/client.py", line 537, in _make_api_call
raise ClientError(parsed_response, operation_name)
ClientError: An error occurred (InvalidRequest) when calling the PutObject operation: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.
从 The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256 and Using boto for AWS S3 Buckets for Signature V4 看来,问题与位于法兰克福的 S3 存储桶密切相关(没有双关语意)。一种解决方案涉及更改 boto 的 connect_to_region
.
host
参数
然而,在我的例子中,boto
的使用由 scrapy
源代码处理,我不想接触它。我该如何解决这个问题?
One solution involves changing the host argument in boto's connect_to_region.
导出到 S3 的存储后端由 scrapy.extensions.feedexport.S3FeedStorage
处理您可以子class S3FeedStorage
class 并实现您自己的一个,这解决了不匹配的 S3 存储桶身份验证机制的问题。
您还需要添加
{
"s3": "myproject.extentions.MyS3FeedStorage",
}
进入FEED_STORAGES
设置让Scrapy使用它。
另见 document
所以这是 scrapy 中的未解决问题 (here). You can work around this by using the aws shared configuration file to set the signature version to s3v4. You can see all the s3 config docs here.
要仅设置 sigv4,您可以创建包含以下内容的文件 ~/.aws/config
:
[default]
s3 =
signature_version = s3v4
或者如果您已经安装了 aws cli,您可以 运行:
aws configure set default.s3.signature_version s3v4
为了完整起见,这里是我对答案的实现。最后我发现修改我的 AWS 配置更容易(Jordan Phillips) rather than to subclass S3FeedStorage
(as recommended by starrify 推荐)。我使用了以下 Dockerfile
到 运行 爬虫:
# Adapted from trcook/docker-scrapy
FROM python:alpine
RUN apk --update add libxml2-dev libxslt-dev libffi-dev gcc musl-dev libgcc openssl-dev
RUN pip install scrapy botocore awscli
RUN aws configure set aws_access_key_id foo
RUN aws configure set aws_secret_access_key bar
RUN aws configure set default.region eu-central-1
RUN aws configure set default.s3.signature_version s3v4
COPY . /scraper
WORKDIR /scraper
CMD ["scrapy", "crawl", "quotes"]
其中 foo
和 bar
分别是实际的 AWS 访问密钥 ID 和 AWS 秘密访问密钥。如果我 docker build --tag quotes .
后跟 docker run quotes
,则爬虫 运行s 没有错误:
2017-05-16 13:03:37 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: tutorial)
2017-05-16 13:03:37 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'tutorial', 'NEWSPIDER_MODULE': 'tutorial.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['tutorial.spiders']}
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: env
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: assume-role
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: shared-credentials-file
2017-05-16 13:03:37 [botocore.credentials] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/endpoints.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/s3/2006-03-01/service-2.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/_retry.json
2017-05-16 13:03:37 [botocore.client] DEBUG: Registering retry handlers for service: s3
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f8c2f2f6a60>
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f8c2f2f6840>
2017-05-16 13:03:37 [botocore.client] DEBUG: Switching signature version for service s3 to version s3v4 based on config file override.
2017-05-16 13:03:37 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2017-05-16 13:03:37 [botocore.client] DEBUG: Defaulting to S3 virtual host style addressing with path style addressing fallback.
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-05-16 13:03:37 [scrapy.core.engine] INFO: Spider opened
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: env
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: assume-role
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: shared-credentials-file
2017-05-16 13:03:37 [botocore.credentials] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/endpoints.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/s3/2006-03-01/service-2.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/_retry.json
2017-05-16 13:03:37 [botocore.client] DEBUG: Registering retry handlers for service: s3
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f8c2f2f6a60>
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f8c2f2f6840>
2017-05-16 13:03:37 [botocore.client] DEBUG: Switching signature version for service s3 to version s3v4 based on config file override.
2017-05-16 13:03:37 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2017-05-16 13:03:37 [botocore.client] DEBUG: Defaulting to S3 virtual host style addressing with path style addressing fallback.
2017-05-16 13:03:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-05-16 13:03:37 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-05-16 13:03:38 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2017-05-16 13:03:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2017-05-16 13:03:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: None)
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Albert Einstein',
'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
'text': '“The world as we have created it is a process of our thinking. It '
'cannot be changed without changing our thinking.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'J.K. Rowling',
'tags': ['abilities', 'choices'],
'text': '“It is our choices, Harry, that show what we truly are, far more '
'than our abilities.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Albert Einstein',
'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'],
'text': '“There are only two ways to live your life. One is as though nothing '
'is a miracle. The other is as though everything is a miracle.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Jane Austen',
'tags': ['aliteracy', 'books', 'classic', 'humor'],
'text': '“The person, be it gentleman or lady, who has not pleasure in a good '
'novel, must be intolerably stupid.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Marilyn Monroe',
'tags': ['be-yourself', 'inspirational'],
'text': "“Imperfection is beauty, madness is genius and it's better to be "
'absolutely ridiculous than absolutely boring.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Albert Einstein',
'tags': ['adulthood', 'success', 'value'],
'text': '“Try not to become a man of success. Rather become a man of value.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'André Gide',
'tags': ['life', 'love'],
'text': '“It is better to be hated for what you are than to be loved for what '
'you are not.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Thomas A. Edison',
'tags': ['edison', 'failure', 'inspirational', 'paraphrased'],
'text': "“I have not failed. I've just found 10,000 ways that won't work.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Eleanor Roosevelt',
'tags': ['misattributed-eleanor-roosevelt'],
'text': '“A woman is like a tea bag; you never know how strong it is until '
"it's in hot water.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Steve Martin',
'tags': ['humor', 'obvious', 'simile'],
'text': '“A day without sunshine is like, you know, night.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Marilyn Monroe',
'tags': ['friends', 'heartbreak', 'inspirational', 'life', 'love', 'sisters'],
'text': "“This life is what you make it. No matter what, you're going to mess "
"up sometimes, it's a universal truth. But the good part is you get "
"to decide how you're going to mess it up. Girls will be your friends "
"- they'll act like it anyway. But just remember, some come, some go. "
"The ones that stay with you through everything - they're your true "
"best friends. Don't let go of them. Also remember, sisters make the "
"best friends in the world. As for lovers, well, they'll come and go "
'too. And baby, I hate to say it, most of them - actually pretty much '
"all of them are going to break your heart, but you can't give up "
"because if you give up, you'll never find your soulmate. You'll "
'never find that half who makes you whole and that goes for '
"everything. Just because you fail once, doesn't mean you're gonna "
'fail at everything. Keep trying, hold on, and always, always, always '
"believe in yourself, because if you don't, then who will, sweetie? "
'So keep your head high, keep your chin up, and most importantly, '
"keep smiling, because life's a beautiful thing and there's so much "
'to smile about.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'J.K. Rowling',
'tags': ['courage', 'friends'],
'text': '“It takes a great deal of bravery to stand up to our enemies, but '
'just as much to stand up to our friends.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Albert Einstein',
'tags': ['simplicity', 'understand'],
'text': "“If you can't explain it to a six year old, you don't understand it "
'yourself.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Bob Marley',
'tags': ['love'],
'text': '“You may not be her first, her last, or her only. She loved before '
'she may love again. But if she loves you now, what else matters? '
"She's not perfect—you aren't either, and the two of you may never be "
'perfect together but if she can make you laugh, cause you to think '
'twice, and admit to being human and making mistakes, hold onto her '
'and give her the most you can. She may not be thinking about you '
'every second of the day, but she will give you a part of her that '
"she knows you can break—her heart. So don't hurt her, don't change "
"her, don't analyze and don't expect more than she can give. Smile "
'when she makes you happy, let her know when she makes you mad, and '
"miss her when she's not there.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Dr. Seuss',
'tags': ['fantasy'],
'text': '“I like nonsense, it wakes up the brain cells. Fantasy is a '
'necessary ingredient in living.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Douglas Adams',
'tags': ['life', 'navigation'],
'text': '“I may not have gone where I intended to go, but I think I have '
'ended up where I needed to be.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Elie Wiesel',
'tags': ['activism',
'apathy',
'hate',
'indifference',
'inspirational',
'love',
'opposite',
'philosophy'],
'text': "“The opposite of love is not hate, it's indifference. The opposite "
"of art is not ugliness, it's indifference. The opposite of faith is "
"not heresy, it's indifference. And the opposite of life is not "
"death, it's indifference.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Friedrich Nietzsche',
'tags': ['friendship',
'lack-of-friendship',
'lack-of-love',
'love',
'marriage',
'unhappy-marriage'],
'text': '“It is not a lack of love, but a lack of friendship that makes '
'unhappy marriages.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Mark Twain',
'tags': ['books', 'contentment', 'friends', 'friendship', 'life'],
'text': '“Good friends, good books, and a sleepy conscience: this is the '
'ideal life.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Allen Saunders',
'tags': ['fate', 'life', 'misattributed-john-lennon', 'planning', 'plans'],
'text': '“Life is what happens to us while we are making other plans.”'}
2017-05-16 13:03:38 [scrapy.core.engine] INFO: Closing spider (finished)
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_ascii_metadata at 0x7f8c2f2b0ae8>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function sse_md5 at 0x7f8c2f2acea0>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function convert_body_to_file_like_object at 0x7f8c2f2b1268>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_bucket_name at 0x7f8c2f2ace18>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7f8c2e7a7780>>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function generate_idempotent_uuid at 0x7f8c2f2aca60>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function conditionally_calculate_md5 at 0x7f8c2f2acd90>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function add_expect_header at 0x7f8c2f2b0378>
2017-05-16 13:03:38 [botocore.handlers] DEBUG: Adding expect 100 continue header to request.
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7f8c2e7a7780>>
2017-05-16 13:03:38 [botocore.endpoint] DEBUG: Making request for OperationModel(name=PutObject) (verify_ssl=True) with params: {'url_path': '/apkmirror/quotes3.json', 'query_string': {}, 'method': 'PUT', 'headers': {'User-Agent': 'Botocore/1.5.49 Python/3.6.1 Linux/4.4.0-75-generic', 'Content-MD5': 'U+PeT0soEYWoCF4DMQXEzA==', 'Expect': '100-continue'}, 'body': <tempfile._TemporaryFileWrapper object at 0x7f8c2f22e2b0>, 'url': 'https://s3.eu-central-1.amazonaws.com/apkmirror/quotes3.json', 'context': {'client_region': 'eu-central-1', 'client_config': <botocore.config.Config object at 0x7f8c2e7a7438>, 'has_streaming_input': True, 'auth_type': None, 'signing': {'bucket': 'apkmirror'}}}
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event request-created.s3.PutObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f8c2e7a73c8>>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event choose-signer.s3.PutObject: calling handler <function set_operation_specific_signer at 0x7f8c2f2ac950>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-sign.s3.PutObject: calling handler <function fix_s3_host at 0x7f8c2f42dd08>
2017-05-16 13:03:38 [botocore.auth] DEBUG: Calculating signature using v4 auth.
2017-05-16 13:03:38 [botocore.auth] DEBUG: CanonicalRequest:
PUT
/apkmirror/quotes3.json
content-md5:U+PeT0soEYWoCF4DMQXEzA==
host:s3.eu-central-1.amazonaws.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20170516T130338Z
content-md5;host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
2017-05-16 13:03:38 [botocore.auth] DEBUG: StringToSign:
AWS4-HMAC-SHA256
20170516T130338Z
20170516/eu-central-1/s3/aws4_request
929e3a39776d42c15c4c7c197c718f67b6105341ed4a269365c6e6ed88378a69
2017-05-16 13:03:38 [botocore.auth] DEBUG: Signature:
81a1c8014fa22d52d371a8aea10d47e0f32e8913dcc18b2f1210c7ce458311e4
2017-05-16 13:03:38 [botocore.endpoint] DEBUG: Sending http request: <PreparedRequest [PUT]>
2017-05-16 13:03:38 [botocore.vendored.requests.packages.urllib3.connectionpool] INFO: Starting new HTTPS connection (1): s3.eu-central-1.amazonaws.com
2017-05-16 13:03:38 [botocore.awsrequest] DEBUG: Waiting for 100 Continue response.
2017-05-16 13:03:38 [botocore.awsrequest] DEBUG: 100 Continue response seen, now sending request body.
2017-05-16 13:03:38 [botocore.vendored.requests.packages.urllib3.connectionpool] DEBUG: "PUT /apkmirror/quotes3.json HTTP/1.1" 200 0
2017-05-16 13:03:38 [botocore.parsers] DEBUG: Response headers: {'x-amz-id-2': 'WB/HgvEGKd7ysqcRa1vodr2znuevKA+fTTX/2elIAcID05t7Ex2G7UTM+rl/AhvIPeB+0gL4YaY=', 'x-amz-request-id': '9C449953B48DA63F', 'date': 'Tue, 16 May 2017 13:03:39 GMT', 'etag': '"53e3de4f4b281185a8085e033105c4cc"', 'content-length': '0', 'server': 'AmazonS3'}
2017-05-16 13:03:38 [botocore.parsers] DEBUG: Response body:
b''
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f8c2e774e10>
2017-05-16 13:03:38 [botocore.retryhandler] DEBUG: No retry needed.
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f8c2e7a7780>>
2017-05-16 13:03:38 [scrapy.extensions.feedexport] INFO: Stored jsonlines feed (20 items) in: s3://apkmirror/quotes3.json
2017-05-16 13:03:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 675,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'downloader/response_bytes': 5976,
'downloader/response_count': 3,
'downloader/response_status_count/200': 2,
'downloader/response_status_count/404': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 5, 16, 13, 3, 38, 317079),
'item_scraped_count': 20,
'log_count/DEBUG': 75,
'log_count/INFO': 11,
'response_received_count': 3,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2017, 5, 16, 13, 3, 37, 897491)}
2017-05-16 13:03:38 [scrapy.core.engine] INFO: Spider closed (finished)
此外,在我的蜘蛛中,我不再需要实施 AWS_ACCESS_KEY_ID
和 AWS_SECRET_ACCESS_KEY
设置,因为这些是来自配置文件的 'picked up'。