Python 抓取编码问题

Python scraping encoding issues

我正在尝试使用 beautifulsoup 抓取网站。我基本上很成功,但有两个问题

  1. 从网站获取数据后,我将它们打印到屏幕上 将它们写入 CSV 文件。网站上有一个价格字段,里面有 实际金额的卢比符号(价格的样本结构 字段:10000 卢比)。当我将金额打印到控制台时,它打印得很好 没有问题。当我尝试将其写入 excel sheet 时,出现错误 "Unicodeencoeerror" 编解码器 'charmap' 无法对字符 '\u20b9' 进行编码 位置 28。我正在打印其他字段以进行控制台,excel 问题显示 仅包含两个字段,一个带有货币符号,另一个带有 * 符号

  2. 我有一个循环 运行 从特定网页获取所有页面 搜索。搜索结果约为 344 页,但循环停止在大约页 43 只有 HTML 错误 500 作为错误消息

    import bs4
    from urllib.request import urlopen as uReq
    
    from bs4 import BeautifulSoup as Soup
    filename = "data.csv"
    f = open(filename,"w")
    headers = "phone_name, phone_price, phone_rating,number_of_ratings, 
    memory, display, camera, battery, processor, Warrenty, security, OS\n"
    f.write(headers)
    
    
    for i in range(2):      # Number of pages minus one 
            my_url = 'https://www.flipkart.com/search?as=off&as-
            show=on&otracker=start&page=
            {}&q=cell+phones&viewType=list'.format(i+1)
            print(my_url)
    
            uClient=uReq(my_url)
    
            page_html=uClient.read()
    
            page_soup = Soup(page_html,"html.parser")
    
            containers=page_soup.findAll("a", {"class":"_1UoZlX"})
    
    
    
    
    for container in containers:          phone_name        =  
    container.find("div",{"class":"_3wU53n"}).text
    
       try:
       phone_price =  container.find("div",{"class":"_1vC4OE _2rQ-NK"}).text
    
       except:
       phone_price           =  'No Data'
    

非常感谢您的帮助!

为 Excel 编写 .CSV 文件时,应使用 utf-8-sig 编码以正确支持任何 Unicode 字符。如果仅使用 utf8 并且显示字符不正确,Excel 将采用 Windows 上的本地化 ANSI 编码。

#!python3
import csv
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as Soup

filename = "data.csv"
with open(filename,'w',newline='',encoding='utf-8-sig') as f:
    w = csv.writer(f)
    headers = 'phone_name phone_price phone_rating number_of_ratings memory display camera battery processor Warrenty security OS'
    w.writerow(headers.split())

    for i in range(2):      # Number of pages minus one 
            my_url = 'https://www.flipkart.com/search?as=off&as-show=on&otracker=start&page={}&q=cell+phones&viewType=list'.format(i+1)
            print(my_url)
            uClient=uReq(my_url)
            page_html=uClient.read()
            page_soup = Soup(page_html,"html.parser")
            containers=page_soup.findAll("a", {"class":"_1UoZlX"})

    for container in containers:
        phone_name = container.find("div",{"class":"_3wU53n"}).text

        try:
            phone_price = container.find("div",{"class":"_1vC4OE _2rQ-NK"}).text
        except:
            phone_price = 'No Data'

        w.writerow([phone_name,phone_price])

输出:

phone_name,phone_price,phone_rating,number_of_ratings,memory,display,camera,battery,processor,Warrenty,security,OS
"Asus Zenfone 3 Laser (Gold, 32 GB)","₹9,999"
"Intex Aqua Style III (Champagne/Champ, 16 GB)","₹3,999"
"iVooMi i1s (Platinum Gold, 32 GB)","₹7,499"
"Xolo ERA 3X (Posh Black, 16 GB)","₹6,999"
"iVooMi Me1 (Sunshine Gold, 8 GB)","₹3,599"
"Panasonic Eluga A4 (Mocha Gold, 32 GB)","₹9,790"
Samsung Metro 313 Dual Sim,"₹2,025"
"Samsung Galaxy J3 Pro (Gold, 16 GB)","₹6,990"
Samsung Guru Music 2,"₹1,625"
"Panasonic Eluga A4 (Marine Blue, 32 GB)","₹9,640"
"Asus Zenfone 4 Selfie (Black, 32 GB)","₹9,999"
Swipe Elite 3- 4G with VoLTE,"₹3,999"
"Asus Zenfone Max (Black, 16 GB)","₹7,486"
Swipe Elite 3- 4G with VoLTE,"₹3,999"
"Swipe Elite Power (Space Grey, 16 GB)","₹5,499"
"Celkon Diamond Mega (Grey, 16 GB)","₹5,499"
"Asus Zenfone Max (Black, 32 GB)","₹7,999"
"Swipe Elite Power (Champagne Gold, 16 GB)","₹5,499"
"Asus Zenfone 4 Selfie (Gold, 32 GB)","₹9,999"
"Karbonn Aura (Champagne, 8 GB)","₹3,199"
"Infinix Note 4 (Ice Blue, 32 GB)","₹8,999"
"Infinix Note 4 (Milan Black, 32 GB)","₹8,999"
"Moto G5s Plus (Blush Gold, 64 GB)","₹15,990"
"Moto G5s Plus (Lunar Grey, 64 GB)","₹15,940"

Excel: