使用 python 将 rss 提要标签 pubDate 导入 MySQL 数据库时出错

Error to import rss feeds tag pubDate into MySQL database using python

我有一个关于在我的 table mysql 中插入标签 pubDate 的问题,实际上我正在尝试将标签(标题,link 和 pubDate) 和最后一个标签 (PubDate) 有问题。

我解释代码:

  1. 第一步读一页rss并写一个xml文件

  2. 第二步生成一个只有3个标签(标题,link和pubDate)的csv文件 注意:在这段代码中我需要使用:item.findtext('pubDate') 因为如果我使用 item.find('pubDate').text 这会产生一个错误,尽管文件是使用这两种情况正确生成的。

  3. 最后一步是将文件 csv 的信息存储到我的 table 中 mysql。

在这一步我收到下一个错误:

Connected to pydev debugger (build 171.4694.38)
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1591, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1018, in run
pydev_imports.execfile(file, globals, locals)  # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull", line 78, in <module>
main()
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull", line 72, in main
testdb()
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull", line 56, in testdb
(r[1:] for r in csv_data.itertuples()))
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 654, in executemany
return self.execute(stmt)
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 551, in execute
self._handle_result(self._connection.cmd_query(stmt))
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 490, in cmd_query
result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 395, in _handle_result
raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1054 (42S22): Unknown column 'nan' in 'field list'

Process finished with exit code 1

我认为问题出在 pubDate 上,因为如果我 运行 程序分为两部分:

第一部分:

创建 xml 和 CSV,但通过以下方式将参数更改为 pubDate:item.find('pubDate').text 成功生成文件 xml 和 csv,但代码显示有关 pubdate 的错误。

第二部分:

从第一步创建的 csv 文件插入到 mysql。程序 运行 成功且没有错误。检查我的数据库并加载信息。

但在这个选项中,我不能 运行 在同一个文件中进行两个程序,因为错误不允许继续并且不允许执行有关插入数据库的部分。

那么错误实际上是关于这段代码的:

# Codigo Python que crea un XML CSV e inserta a una BD MYSQL.
# Llamamos los modulos que necesitamos para ejecutar este script
import csv
import MySQLdb
import requests
import xml.etree.ElementTree as ET
import mysql.connector
import pandas as pd


def loadRSS():
    # Configuramos la URL del rss de CNN
    url = 'http://rss.cnn.com/rss/edition.xml'

    # Creamos un objeto con el que vamos a obtener la url de la variable declarada hace un momento
    resp = requests.get(url)

    # Procedemos a guardar la informacion en un archivo llamado cnn.XML
    with open('cnn.xml', 'wb') as f:
        f.write(resp.content)


def loadcsv():
    tree = ET.parse("cnn.xml")
    root = tree.getroot()

    d = open('cnn.csv', 'w')

    csvwriter = csv.writer(d)

    count = 0

    head = ['title', 'link', 'pubDate']

    csvwriter.writerow(head)

    for item in root.findall('./channel/item'):
        row = []
        title_name = item.find('title').text
        row.append(title_name)
        link_name = item.find('link').text
        row.append(link_name)
        pubDate_name = item.findtext('pubDate')
        row.append(pubDate_name)
        csvwriter.writerow(row)
    d.close()

def testdb():
    cnx = mysql.connector.connect(user='root', password='password', host='localhost', database='cnn')
    cursor = cnx.cursor()
    csv_data = pd.read_csv('cnn.csv')

    for row in csv_data.iterrows():
        cursor.executemany(
            "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
            (r[1:] for r in csv_data.itertuples()))

    cnx.commit()
    cursor.close()
    cnx.close()

    #connection = MySQLdb.Connect(host='localhost', user='root', passwd='password', db='cnn')
    #cursor = connection.cursor()
    #query = "LOAD DATA INFILE 'cnn.csv' INTO TABLE noticias(title, link, pubdate)"
    #cursor.execute(query)
    #connection.commit()

def main():
    # Inicializamos los modulos definidos en el programa.
    loadRSS()
    loadcsv()
    testdb()



if __name__ == "__main__":
    # llamamos el metodo main
    main()

有人确实知道这个错误。

更新: 我添加行:

print(csv_data.head())

添加您评论的输出,调试器的结果是:

Connected to pydev debugger (build 171.4694.38)
                                               title  \
0  Bloodied and broken: The battle against ISIS i...   
1                            The human cost of ISIS    
2                  B deal to prop up UK government   
3               Netanyahu freezes Western Wall plans   
4  Only a 'couple of hundred' ISIS fighters left ...   

                                                link  \
0                              http://cnn.it/2sbE6fp   
1  http://www.cnn.com/videos/world/2017/06/25/phi...   
2  http://www.cnn.com/2017/06/26/europe/theresa-m...   
3  http://www.cnn.com/2017/06/26/middleeast/weste...   
4  http://www.cnn.com/2017/06/26/middleeast/coupl...   

                            date  
0                            NaN  
1  Mon, 26 Jun 2017 08:49:00 GMT  
2  Mon, 26 Jun 2017 11:59:24 GMT  
3  Mon, 26 Jun 2017 13:09:30 GMT  
4  Mon, 26 Jun 2017 13:16:21 GMT  
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1591, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1018, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 80, in <module>
    main()
  File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 74, in main
    testdb()
  File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 58, in testdb
    (r[1:] for r in csv_data.itertuples()))
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 654, in executemany
    return self.execute(stmt)
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 551, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 490, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 395, in _handle_result
    raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1054 (42S22): Unknown column 'nan' in 'field list'

Process finished with exit code 1

2017 年 6 月 27 日更新:

我添加了testdb的部分,现在是这样的:

def testdb():
    cnx = mysql.connector.connect(user='root', password='password', host='localhost', database='cnn')
    cursor = cnx.cursor()

    with open('cnn.csv') as fh:
        cursor.executemany(
            "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
            [tuple(row) for row in csv.reader(fh)]
        )

    cnx.commit()
    cursor.close()
    cnx.close()

当我调试程序时,错误是:

Connected to pydev debugger (build 171.4694.38)
Traceback (most recent call last):
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 75, in __call__
    return bytes(self.params[index])
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1591, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1018, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 79, in <module>
    main()
  File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 73, in main
    testdb()
  File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 56, in testdb
    [tuple(row) for row in csv.reader(fh)]
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 652, in executemany
    stmt = self._batch_insert(operation, seq_params)
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 594, in _batch_insert
    tmp = RE_PY_PARAM.sub(psub, tmp)
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 78, in __call__
    "Not enough parameters for the SQL statement")
mysql.connector.errors.ProgrammingError: Not enough parameters for the SQL statement

Process finished with exit code 1

不知道是不是忘记补充了

Comment: ... but the error now is

只有 第一个 错误是相关的:IndexError: tuple index out of range
CSV 数据一定是错误的,在传递给 MySQL:

之前检查
import csv
records = []
with open('test/cnn.csv') as fh:
    for row in csv.reader(fh):
        _tuple = tuple(row)
        if len(_tuple) == 3:
            records.append(_tuple)
        else:
            print('[FAIL]: Tupel Length not 3, found {} in {}'.format(len(_tuple), _tuple))

cursor.executemany("INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)", records)

Comment: Error: Not all parameters were used in the SQL statement
According to MySQL Connector/Python Developer Guide: 10.5.5 - MySQLCursor.executemany() Method:

data = [
  ('Jane', date(2005, 2, 12)),
  ('Joe', date(2006, 5, 23)),
  ('John', date(2010, 10, 3)),
]
stmt = "INSERT INTO employees (first_name, hire_date) VALUES (%s, %s)"
cursor.executemany(operation, seq_of_params)

seq_of_params have to be a List of Tuples

因此您不需要 for 循环来迭代 CSV 行数据,您必须将整个 CSV 数据作为元组列表传递。第二次使用 csv module 而不是 pandas。 更改为:

import csv
with open('cnn.csv') as fh:
    cursor.executemany(
        "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
        [tuple(row) for row in csv.reader(fh)]
    )

使用 Python 测试:3.4.2


Question: somebody does have an idea about this error.

Unknown column 'nan' in 'field list'

你的这部分代码是错误的。您正在迭代 csv_data 两次。

for row in csv_data.iterrows():
    cursor.executemany(
        "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
        (r[1:] for r in csv_data.itertuples()))

无法判断这是否会导致上述错误,但您应该更改为以下内容并重试以验证错误是否仍然存在:

for row in csv_data.iterrows():
    cursor.executemany(
        "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
        ((value for value in row[1]))