如何使用 Python 从网站获取 json 字符串?
How to get json string from a website using Python?
我正在尝试从 website 获取产品的图像链接。我可以获得一些产品的图像信息。但是,我无法得到其中的一些。在代码中,URL1 正在运行,但 URL2 抛出“json.decoder.JSONDecodeError”。我认为问题是我无法解析 JSON 字符串。我不擅长正则表达式。我怎样才能得到 JSON 字符串?
代码
import re,json,requests
url1 = "https://www.trendyol.com/samsung/akilli-smart-air-sihirli-led-tv-televizyon-kumandasi-yerine-tuslu-kumanda-1078-p-43447565?boutiqueId=61&merchantId=384846"
url2 = "https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135"
r = requests.get(url2)
data = json.loads(re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__=(.*?);', r.text).group(1))
images = ['https://www.trendyol.com' + img for img in data['product']['images']]
print(images)
以下正则表达式更适合您给定的网址,因为它在嵌套词典的末尾和下一个块的开始之前终止。
import re,json,requests
url1 = "https://www.trendyol.com/samsung/akilli-smart-air-sihirli-led-tv-televizyon-kumandasi-yerine-tuslu-kumanda-1078-p-43447565?boutiqueId=61&merchantId=384846"
url2 = "https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135"
for url in [url1, url2]:
r = requests.get(url)
data = json.loads(re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__=(.*?\}\});', r.text).group(1))
images = ['https://www.trendyol.com' + img for img in data['product']['images']]
print(images)
print("")
你可以试试这个:
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0',
}
r = requests.get('https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135')
soup = BeautifulSoup ((r.text).encode('utf-8'))
img = soup.findAll ('img')
for x in img:
print(x['src'])
我正在尝试从 website 获取产品的图像链接。我可以获得一些产品的图像信息。但是,我无法得到其中的一些。在代码中,URL1 正在运行,但 URL2 抛出“json.decoder.JSONDecodeError”。我认为问题是我无法解析 JSON 字符串。我不擅长正则表达式。我怎样才能得到 JSON 字符串?
代码
import re,json,requests
url1 = "https://www.trendyol.com/samsung/akilli-smart-air-sihirli-led-tv-televizyon-kumandasi-yerine-tuslu-kumanda-1078-p-43447565?boutiqueId=61&merchantId=384846"
url2 = "https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135"
r = requests.get(url2)
data = json.loads(re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__=(.*?);', r.text).group(1))
images = ['https://www.trendyol.com' + img for img in data['product']['images']]
print(images)
以下正则表达式更适合您给定的网址,因为它在嵌套词典的末尾和下一个块的开始之前终止。
import re,json,requests
url1 = "https://www.trendyol.com/samsung/akilli-smart-air-sihirli-led-tv-televizyon-kumandasi-yerine-tuslu-kumanda-1078-p-43447565?boutiqueId=61&merchantId=384846"
url2 = "https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135"
for url in [url1, url2]:
r = requests.get(url)
data = json.loads(re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__=(.*?\}\});', r.text).group(1))
images = ['https://www.trendyol.com' + img for img in data['product']['images']]
print(images)
print("")
你可以试试这个:
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0',
}
r = requests.get('https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135')
soup = BeautifulSoup ((r.text).encode('utf-8'))
img = soup.findAll ('img')
for x in img:
print(x['src'])