sqlalchemy.exc.StatementError: invalid literal for int() with base 10 in scraper

Question

我写了一个 Python 2.7 抓取工具，但在尝试保存我的数据时遇到错误。 scraper 是用 Scraperwiki 编写的，但我认为这在很大程度上与我遇到的错误无关 - 在 Scraperwiki 中保存似乎是使用 Sqlalchemy 处理的，这就是导致错误的原因。

我收到此错误消息：

Traceback (most recent call last):
  File "./code/scraper", line 192, in <module>
    saving(spreadsheet_pass)
  File "./code/scraper", line 165, in saving
    scraperwiki.sql.save(["URN"], school, "magic")
  File "/usr/local/lib/python2.7/dist-packages/scraperwiki/sql.py", line 195, in save
    connection.execute(insert.values(row))
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 729, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", line 321, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 826, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 893, in _execute_context
    None, None)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1160, in _handle_dbapi_exception
    exc_info
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 889, in _execute_context
    context = constructor(dialect, self, conn, *args)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 573, in _init_compiled
    param.append(processors[key](compiled_params[key]))
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/processors.py", line 56, in boolean_to_int
    return int(value)
sqlalchemy.exc.StatementError: invalid literal for int() with base 10: 'n/a' (original cause: ValueError: invalid literal for int() with base 10: 'n/a') u'INSERT OR REPLACE INTO magic (published_recent, inspection_rating2, schooltype, "LA", "URL", "URN", schoolname, open_closed, opendate_full, inspection_rating, opendate_short, phase, publication_date, include, notes, inspection_date) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)' []

当试图保存这行数据时：

{u'published_recent': 'n/a', u'inspection_rating2': 'n/a', u'schooltype': u'Free school', u'LA': u'Tower Hamlets', u'URL': u'http://www.ofsted.gov.uk/inspection-reports/find-inspection-report/provider/ELS/138262', u'URN': u'138262', u'schoolname': u'City Gateway 14-19 Provision', u'open_closed': u'Open', u'opendate_full': u'2012-09-03', u'inspection_rating': 'No section 5 inspection yet', u'opendate_short': u'September 2012', u'phase': u'Alternative provision', u'publication_date': 'n/a', u'include': False, u'notes': 'test message', u'inspection_date': 'n/a'}

使用这行代码：

scraperwiki.sql.save(["URN"], school, "magic")

（在 Scraperwiki 中，将 'school' 字典中的数据保存到名为 'magic' 的数据库中，使用键 'URN' 作为唯一键。）

奇怪的是，有时抓取工具工作正常，但我没有收到错误，但其他时候，运行相同的代码，我收到此错误。

我尝试过的事情：

正在清除我要保存到的数据库，或者使用不同的名称启动一个新数据库。都没有用。
正在编辑正在保存的数据。该错误是指针对键 'published_recent' 保存的 'n/a' 值存在问题。以前的数据行，保存没有问题，包含布尔类型的数据，所以我认为字符串由于某种原因造成了困难。将值更改为整数意味着我不会收到此错误。现在我无法复制它（当值为整数时保存似乎有效），但我想当我尝试将 'published_recent' 值更改为该行的整数时收到此错误似乎给我带来问题的数据：sqlalchemy.exc.IntegrityError: (IntegrityError) constraint failed

无论哪种方式，这都不是真正的解决方案，因为我需要能够保存字符串。

阅读关于这两个错误的所有 Whosebug 问题，以及 sqlalchemy 文档。我找不到似乎可以解决我遇到的问题的任何内容。
对数据使用自动递增键。我将数据保存在键 'URN' 上，这是唯一的，但我认为由于某种原因，抓取器在保存时可能会使用 'published_recent' 键作为唯一键，所以我尝试使用自动递增键，遵循此答案：ScraperWiki: How to create and add records with autoincrement key。仍然得到同样的错误。

预先感谢您的任何回答 - 这让我有点抓狂。

Answer 1

错误说它试图保存为整数的值是 'n/a'。如果你正在抓取数据，那么你并不总能得到你想要的。似乎 'n/a' 是他们在您正在抓取的网站上放置的内容，而该字段没有编号。在保存数据之前，您必须对数据进行一些验证。

sqlalchemy.exc.StatementError: invalid literal for int() with base 10 in scraper

sqlalchemy.exc.StatementError: invalid literal for int() with base 10 in scraper

sql

sqlalchemy

python-2.7

scraperwiki