ValueError: Cannot convert <.....><....> to Excel

Question

嗨，我是 python 编程新手。我尝试使用 python 来抓取新闻网站。我得到了标题及其链接。但是当我尝试将它保存在 excel 文件中时，它显示值错误

这是源代码和错误

import requests, openpyxl
from bs4 import BeautifulSoup

excel = openpyxl.Workbook()
sheet = excel.active
sheet.title = 'Maalaimalar Links'
sheet.append(['Title','Link'])

req = requests.get("https://www.maalaimalar.com/news/topnews/1")

head_lines = BeautifulSoup(req.text, 'html.parser')

hliness = head_lines.find_all('div', class_ = 'col-md-4 article')

for hlines in hliness:
    h2lines = hlines.find('h3').text
    link = hlines.find('a')
    print(h2lines)
    print(link.get('href'))
    sheet.append([h2lines, link])


excel.save('maalaimalar.xlsx')

这是我执行此行时的错误

sheet.append([h2lines, link])


ValueError: Cannot convert <a href="https://www.maalaimalar.com/news/topnews/2022/03/06182721/3549285/IPL-2022-Schedule-match-details-for-Chennai-super.vpf"><h3>ஐபிஎல்  2022 அட்டவணை- சென்னை அணி மோதும் ஆட்டங்கள் விவரம்</h3></a> to Excel.

Answer 1

您正在尝试将 BeautifulSoup 对象推送到您的 excel，而不是提取 href，如 print(link.get('href')):

link = hlines.find('a').get('href')

或

link = hlines.a.get('href')

例子

import requests, openpyxl
from bs4 import BeautifulSoup

excel = openpyxl.Workbook()
sheet = excel.active
sheet.title = 'Maalaimalar Links'
sheet.append(['Title','Link'])

req = requests.get("https://www.maalaimalar.com/news/topnews/1")

head_lines = BeautifulSoup(req.text, 'html.parser')

hliness = head_lines.find_all('div', class_ = 'col-md-4 article')

for hlines in hliness:
    h2lines = hlines.find('h3').text
    link = hlines.find('a').get('href')
    sheet.append([h2lines, link])

excel.save('maalaimalar.xlsx')

ValueError: Cannot convert <.....><....> to Excel

ValueError: Cannot convert <.....><....> to Excel

beautifulsoup

python-3.x

python-requests

openpyxl

valueerror

例子