scrapy plash 设置输入值?
scrapy plash set input value?
我已经成功加载了 javascript 使用 scrapy-splash 生成的 html。现在我想设置几个不属于表单的输入值。一旦我输入值,网站上的内容就会发生变化。我还没有找到一种方法来设置输入值和重新调整调整后的 html。这可能吗?
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = (
'https://example.com',
)
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url, self.parse, meta={
'splash': {
'endpoint': 'render.html',
'args': {'wait': 3}
}
})
def parse(self, response):
page = response.url.split("/")[-2]
filename = 'screener-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
self.log('Saved file %s' % filename)
您需要按照评论中有人的建议将输入放在 lua_script 中,下面是单击按钮的示例:
script ="""
function main(splash)
local url = splash.args.url
assert(splash:go(url))
assert(splash:runjs('document.getElementsByClassName("nameofbutton").click()'))
assert(splash:wait(0.75))
-- return result as a JSON object
return {
html = splash:html()
}
end
"""
然后像这样执行脚本:
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url, self.parse_item, meta={
'splash': {
'args': {'lua_source': self.script},
'endpoint': 'execute',
}
})
我已经成功加载了 javascript 使用 scrapy-splash 生成的 html。现在我想设置几个不属于表单的输入值。一旦我输入值,网站上的内容就会发生变化。我还没有找到一种方法来设置输入值和重新调整调整后的 html。这可能吗?
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = (
'https://example.com',
)
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url, self.parse, meta={
'splash': {
'endpoint': 'render.html',
'args': {'wait': 3}
}
})
def parse(self, response):
page = response.url.split("/")[-2]
filename = 'screener-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
self.log('Saved file %s' % filename)
您需要按照评论中有人的建议将输入放在 lua_script 中,下面是单击按钮的示例:
script ="""
function main(splash)
local url = splash.args.url
assert(splash:go(url))
assert(splash:runjs('document.getElementsByClassName("nameofbutton").click()'))
assert(splash:wait(0.75))
-- return result as a JSON object
return {
html = splash:html()
}
end
"""
然后像这样执行脚本:
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url, self.parse_item, meta={
'splash': {
'args': {'lua_source': self.script},
'endpoint': 'execute',
}
})