Scrapy 将 html 元素保存到 html 文件

Question

我想将 div 元素 class="col-md-12 blog-data"（包含图像）保存到 html 文件。我应该把 response.css 放在哪里？我是 python 和 scrapy 的新手。

import scrapy

class QuotesSpider(scrapy.Spider):
name = "quotes"
def start_requests(self):
    urls = [
        'mysite.com/articles/1',            
    ]
    for url in urls:
        yield scrapy.Request(url=url, callback=self.parse)

def parse(self, response):         
    page = response.url.split("/")[-2]
    filename = 'quotes-%s.html' % page
    with open(filename, 'wb') as f:

     f.write(response.body) //I've used it here and it gave me blank html

    self.log('Saved file %s' % filename)

是否可以像这样连接自定义字符串并保存到 html 文件？请给我一些例子。谢谢你。

mytext="<html><head></head><body>
<div id='mycustomelement'>
{  ('.blog-data')response //how to get this  }
</div></body></html>"

Answer 1

您不能使用 response.css 来设置样式。 response 对象将没有名为 .css 的方法。如果想将 css 连接到 div，你必须使用正则表达式和连接，或者更简洁的方法是在头部附加 mystyle.css 文件，并在 mystle.css.

<link rel="stylesheet" type="text/css" href="mystyle.css">

您可以使用 BeautifulSoup 来做到这一点。

from bs4 import BeautifulSoup
soup = BeautifulSoup(response.body)

metatag = soup.new_tag('link')
metatag.attrs['rel'] = 'stylesheet'
metatag.attrs['type'] = 'text/css'
metatag.attrs['src'] = 'mystyle.css'

soup.head.append(metatag)

Scrapy 将 html 元素保存到 html 文件

Scrapy save html element to html file

python

scapy

scrapy-spider