将数据追加到数据框中

Append data into data frame

他们告诉我这些错误 ValueError: All arrays must be of the same length 我怎样才能解决这些错误 任何给我解决这些问题的人 我正在尝试很多方法但我无法解决这些错误 所以我该如何处理这些错误数组不一样

import enum
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd 

url="https://www.fleetpride.com/parts/otr-coiled-air-hose-otr6818"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"
}
r = requests.get(url)
soup = BeautifulSoup(r.content, "html5lib")
raw_json = ""
for table_index,table in enumerate( soup.find_all("script")):
    if('CCRZ.detailData.jsonProductData = {"' in str(table)):
        x=str(table).split('CCRZ.detailData.jsonProductData = {"')
        raw_json = "{\""+str(x[-1]).split('};')[0]+"}"
        break
           
req_json = json.loads(raw_json)
# with open("text_json.json","w")as file:
#     x=json.dump(req_json,file,indent=4)

temp = req_json

name=[]
specs=[]


title=temp['product']['prodBean']['name']
name.append(title)


item=temp['specifications']['MARKETING']
for i in item:
    try:
        get=i['value']
    except:
        pass

    specs.append(get)


temp={'title':name,'Specification':specs}
df=pd.DataFrame(temp)
print(df)
    
  


    

虽然错误很明显,但问题和预期结果却不是。

您尝试创建 DataFrame 的方式必须处理丢失的行,这就是出现错误的原因。要解决此问题,您可以从 dict:

创建 DataFrame
pd.DataFrame.from_dict(temp, orient='index')

但是看起来很丑而且以后处理不好,所以替代方案是:

data = [{
    'title':temp['product']['prodBean']['name'],
    'specs':','.join([s.get('value') for s in temp['specifications']['MARKETING']])
}]
pd.DataFrame(data)

如果您希望每个规格都在一个新行中,请执行以下操作:

data = {
    'title':temp['product']['prodBean']['name'],
    'specs':[s.get('value') for s in temp['specifications']['MARKETING']]
}

pd.DataFrame.from_dict(data)
例子
import enum
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd 

url="https://www.fleetpride.com/parts/otr-coiled-air-hose-otr6818"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"
}
r = requests.get(url)
soup = BeautifulSoup(r.content, "html5lib")
raw_json = ""
for table_index,table in enumerate( soup.find_all("script")):
    if('CCRZ.detailData.jsonProductData = {"' in str(table)):
        x=str(table).split('CCRZ.detailData.jsonProductData = {"')
        raw_json = "{\""+str(x[-1]).split('};')[0]+"}"
        break

temp = json.loads(raw_json)

data = [{
    'title':temp['product']['prodBean']['name'],
    'specs':','.join([s.get('value') for s in temp['specifications']['MARKETING']])
}]

pd.DataFrame(data)
输出
title specs
0 OTR Trailer Air Hose and Electric Cable Assembly, 15' Spiral wound,Includes hang collar,One bundle for easy management