将数据追加到数据框中
Append data into data frame
他们告诉我这些错误 ValueError: All arrays must be of the same length
我怎样才能解决这些错误 任何给我解决这些问题的人 我正在尝试很多方法但我无法解决这些错误 所以我该如何处理这些错误数组不一样
import enum
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
url="https://www.fleetpride.com/parts/otr-coiled-air-hose-otr6818"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"
}
r = requests.get(url)
soup = BeautifulSoup(r.content, "html5lib")
raw_json = ""
for table_index,table in enumerate( soup.find_all("script")):
if('CCRZ.detailData.jsonProductData = {"' in str(table)):
x=str(table).split('CCRZ.detailData.jsonProductData = {"')
raw_json = "{\""+str(x[-1]).split('};')[0]+"}"
break
req_json = json.loads(raw_json)
# with open("text_json.json","w")as file:
# x=json.dump(req_json,file,indent=4)
temp = req_json
name=[]
specs=[]
title=temp['product']['prodBean']['name']
name.append(title)
item=temp['specifications']['MARKETING']
for i in item:
try:
get=i['value']
except:
pass
specs.append(get)
temp={'title':name,'Specification':specs}
df=pd.DataFrame(temp)
print(df)
虽然错误很明显,但问题和预期结果却不是。
您尝试创建 DataFrame 的方式必须处理丢失的行,这就是出现错误的原因。要解决此问题,您可以从 dict:
创建 DataFrame
pd.DataFrame.from_dict(temp, orient='index')
但是看起来很丑而且以后处理不好,所以替代方案是:
data = [{
'title':temp['product']['prodBean']['name'],
'specs':','.join([s.get('value') for s in temp['specifications']['MARKETING']])
}]
pd.DataFrame(data)
如果您希望每个规格都在一个新行中,请执行以下操作:
data = {
'title':temp['product']['prodBean']['name'],
'specs':[s.get('value') for s in temp['specifications']['MARKETING']]
}
pd.DataFrame.from_dict(data)
例子
import enum
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
url="https://www.fleetpride.com/parts/otr-coiled-air-hose-otr6818"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"
}
r = requests.get(url)
soup = BeautifulSoup(r.content, "html5lib")
raw_json = ""
for table_index,table in enumerate( soup.find_all("script")):
if('CCRZ.detailData.jsonProductData = {"' in str(table)):
x=str(table).split('CCRZ.detailData.jsonProductData = {"')
raw_json = "{\""+str(x[-1]).split('};')[0]+"}"
break
temp = json.loads(raw_json)
data = [{
'title':temp['product']['prodBean']['name'],
'specs':','.join([s.get('value') for s in temp['specifications']['MARKETING']])
}]
pd.DataFrame(data)
输出
title
specs
0
OTR Trailer Air Hose and Electric Cable Assembly, 15'
Spiral wound,Includes hang collar,One bundle for easy management
他们告诉我这些错误 ValueError: All arrays must be of the same length
我怎样才能解决这些错误 任何给我解决这些问题的人 我正在尝试很多方法但我无法解决这些错误 所以我该如何处理这些错误数组不一样
import enum
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
url="https://www.fleetpride.com/parts/otr-coiled-air-hose-otr6818"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"
}
r = requests.get(url)
soup = BeautifulSoup(r.content, "html5lib")
raw_json = ""
for table_index,table in enumerate( soup.find_all("script")):
if('CCRZ.detailData.jsonProductData = {"' in str(table)):
x=str(table).split('CCRZ.detailData.jsonProductData = {"')
raw_json = "{\""+str(x[-1]).split('};')[0]+"}"
break
req_json = json.loads(raw_json)
# with open("text_json.json","w")as file:
# x=json.dump(req_json,file,indent=4)
temp = req_json
name=[]
specs=[]
title=temp['product']['prodBean']['name']
name.append(title)
item=temp['specifications']['MARKETING']
for i in item:
try:
get=i['value']
except:
pass
specs.append(get)
temp={'title':name,'Specification':specs}
df=pd.DataFrame(temp)
print(df)
虽然错误很明显,但问题和预期结果却不是。
您尝试创建 DataFrame 的方式必须处理丢失的行,这就是出现错误的原因。要解决此问题,您可以从 dict:
创建 DataFramepd.DataFrame.from_dict(temp, orient='index')
但是看起来很丑而且以后处理不好,所以替代方案是:
data = [{
'title':temp['product']['prodBean']['name'],
'specs':','.join([s.get('value') for s in temp['specifications']['MARKETING']])
}]
pd.DataFrame(data)
如果您希望每个规格都在一个新行中,请执行以下操作:
data = {
'title':temp['product']['prodBean']['name'],
'specs':[s.get('value') for s in temp['specifications']['MARKETING']]
}
pd.DataFrame.from_dict(data)
例子
import enum
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
url="https://www.fleetpride.com/parts/otr-coiled-air-hose-otr6818"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"
}
r = requests.get(url)
soup = BeautifulSoup(r.content, "html5lib")
raw_json = ""
for table_index,table in enumerate( soup.find_all("script")):
if('CCRZ.detailData.jsonProductData = {"' in str(table)):
x=str(table).split('CCRZ.detailData.jsonProductData = {"')
raw_json = "{\""+str(x[-1]).split('};')[0]+"}"
break
temp = json.loads(raw_json)
data = [{
'title':temp['product']['prodBean']['name'],
'specs':','.join([s.get('value') for s in temp['specifications']['MARKETING']])
}]
pd.DataFrame(data)
输出
title | specs | |
---|---|---|
0 | OTR Trailer Air Hose and Electric Cable Assembly, 15' | Spiral wound,Includes hang collar,One bundle for easy management |