Scrapy 抓取并将数据提取到 mysql
Scrapy crawl and extract data into mysql
我正在尝试获取价格并将其保存到数据库中,但我无法找出代码的问题,我可以提取数据并使用 -o save.xml 在推荐中保存它,但是当我尝试集成 settings.py 以将数据保存到 MySql 数据库时,一切都变了。当我尝试使用 -o save.xml 再次保存信息时,它没有显示价格结果。我确实注意到我的数据库 ID 自动递增确实发生了变化,但没有插入任何数据。
有人可以帮帮我吗?这是我的代码。
test.py
------------------------
import scrapy
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.selector import HtmlXPathSelector
from getprice.items import GetPriceItem
from scrapy.log import *
from getprice.settings import *
from getprice.items import *
class MySpider(CrawlSpider):
name = "getprice"
allowed_domains = ["www.craigslist.ca"]
start_urls = ["http://calgary.craigslist.ca/search/sss"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("//div[@'sliderforward arrow']")
items = []
for title in titles:
item = GetPriceItem()
item ["price"] = title.select("text()").extract()[0]
insert_table(item)
settings.py
---------------------
BOT_NAME = 'getp'
BOT_VERSION = '1.0'
import sys
import MySQLdb
# SCRAPY SETTING
SPIDER_MODULES = ['getprice.spiders']
NEWSPIDER_MODULE = 'getprice.spiders'
USER_AGENT = '%s/%s' % (BOT_NAME, BOT_VERSION)
# SQL DATABASE SETTING
SQL_DB = 'test'
SQL_TABLE = 'testdata'
SQL_HOST = 'localhost'
SQL_USER = 'root'
SQL_PASSWD = 'testing'
SQL_LIST = 'price'
# connect to the MySQL server
try:
CONN = MySQLdb.connect(host=SQL_HOST,
user=SQL_USER,
passwd=SQL_PASSWD,
db=SQL_DB)
except MySQLdb.Error, e:
print "Error %d: %s" % (e.args[0], e.args[1])
sys.exit(1)
cursor = CONN.cursor() # important MySQLdb Cursor object
def insert_table(item):
sql = "INSERT INTO %s (%s) \
values('%s')" % (SQL_TABLE, SQL_LIST,
MySQLdb.escape_string(item['price'].encode('utf-8')),
)
# print sql
if cursor.execute(sql):
print "Inserted"
else:
print "Something wrong"
你需要以正确的方式并遵循Scrapy的Control Flow。
创建一个 "Pipeline" 来负责将您的项目保存在数据库中。
MySQL 管道示例:
我正在尝试获取价格并将其保存到数据库中,但我无法找出代码的问题,我可以提取数据并使用 -o save.xml 在推荐中保存它,但是当我尝试集成 settings.py 以将数据保存到 MySql 数据库时,一切都变了。当我尝试使用 -o save.xml 再次保存信息时,它没有显示价格结果。我确实注意到我的数据库 ID 自动递增确实发生了变化,但没有插入任何数据。
有人可以帮帮我吗?这是我的代码。
test.py
------------------------
import scrapy
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.selector import HtmlXPathSelector
from getprice.items import GetPriceItem
from scrapy.log import *
from getprice.settings import *
from getprice.items import *
class MySpider(CrawlSpider):
name = "getprice"
allowed_domains = ["www.craigslist.ca"]
start_urls = ["http://calgary.craigslist.ca/search/sss"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("//div[@'sliderforward arrow']")
items = []
for title in titles:
item = GetPriceItem()
item ["price"] = title.select("text()").extract()[0]
insert_table(item)
settings.py
---------------------
BOT_NAME = 'getp'
BOT_VERSION = '1.0'
import sys
import MySQLdb
# SCRAPY SETTING
SPIDER_MODULES = ['getprice.spiders']
NEWSPIDER_MODULE = 'getprice.spiders'
USER_AGENT = '%s/%s' % (BOT_NAME, BOT_VERSION)
# SQL DATABASE SETTING
SQL_DB = 'test'
SQL_TABLE = 'testdata'
SQL_HOST = 'localhost'
SQL_USER = 'root'
SQL_PASSWD = 'testing'
SQL_LIST = 'price'
# connect to the MySQL server
try:
CONN = MySQLdb.connect(host=SQL_HOST,
user=SQL_USER,
passwd=SQL_PASSWD,
db=SQL_DB)
except MySQLdb.Error, e:
print "Error %d: %s" % (e.args[0], e.args[1])
sys.exit(1)
cursor = CONN.cursor() # important MySQLdb Cursor object
def insert_table(item):
sql = "INSERT INTO %s (%s) \
values('%s')" % (SQL_TABLE, SQL_LIST,
MySQLdb.escape_string(item['price'].encode('utf-8')),
)
# print sql
if cursor.execute(sql):
print "Inserted"
else:
print "Something wrong"
你需要以正确的方式并遵循Scrapy的Control Flow。
创建一个 "Pipeline" 来负责将您的项目保存在数据库中。
MySQL 管道示例: