TypeError: Cannot read object of type 'list'

TypeError: Cannot read object of type 'list'

据我所知,我还没有创建列表,但它给了我一个

TypeError: Cannot read object of type 'list'.

有什么想法吗?

Python 新手,慢慢来。

感谢任何帮助。

样本url:

https://nclbgc.org/search/licenseDetails?licenseNumber=80479

这里是完整的回溯:

Traceback (most recent call last):
  File "ncscribble.py", line 26, in <module>
    df = pd.read_html(url)[0].dropna(how='all')
  File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\pandas\io\html.py", line 987, in read_html
    displayed_only=displayed_only)
  File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\pandas\io\html.py", line 815, in _parse
    raise_with_traceback(retained)
  File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\pandas\compat\__init__.py", line 404, in raise_with_traceback
    raise exc.with_traceback(traceback)
TypeError: Cannot read object of type 'list'

完整代码:

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
import time
import csv
import pandas as pd
import os
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email import encoders

def license_exists(soup):
    with open('NC_urls.csv','r') as csvf:
        urls = csv.reader(csvf)
        for url in urls:
            if soup(class_='btn btn-primary"'):
                return False
            else:
                return True


with open('NC_urls.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        df = pd.read_html(url)[0].dropna(how='all')
        df = df.groupby(0)[1].apply(lambda x: ' '.join(x.dropna())).to_frame().rename_axis(None).T
        if not license_exists(soup(page, 'html.parser')):
            # if the license is present we don't want to parse any more urls.

            break


df.to_csv('NC_Licenses_Daily.csv', index=False)

当你遇到类型错误时,打印值通常是个好主意,像这样:

    for url in urls:
        print(repr(url))
        df = pd.read_html(url)[0].dropna(how='all')

它会给你:

['https://nclbgc.org/search/licenseDetails?licenseNumber=80479']

这是因为 CSV 本身就是一个列表。您需要获取第一个列表元素并将其传递给 HTML 处理器:

    for url in urls:
        df = pd.read_html(url[0])[0].dropna(how='all')

要获取页面数据,您可以使用 requests:

import requests
page = requests.get(url[0]).content