在解析脚本中的数据时获取 Phone 数字时出现问题
Trouble in Getting Phone Number While Parsing the data inside the script
代码
import requests
from bs4 import BeautifulSoup as bs
my_url='https://www.olx.com.pk/item/oppo-f17-pro8128-iid-1034320813'
with requests.session() as s:
r=s.get(my_url)
page_html=bs(r.content,'html.parser')
safe=page_html.findAll('script')
print("The Length if Script is {0}:".format(len(safe)))
for i in safe:
if "+92" in str(i):
print(i)
查询
我想使用 python 脚本获取 phone 实际存在于 windows.state 中的数字,但我不知道如何解析 window.state.Will 非常感谢如果你帮我解决这个问题。提前致谢!
正如我在评论中提到的,window.state
出现在第 7 个 <script>
标签内。
我提取了脚本标签的内容并对 phoneNumber
进行了字符串搜索,找到了它的索引并能够获取您需要的数据。
从 JSON 中提取数据会更容易,但数据不是 JSON 格式。
import bs4 as bs
import requests
url = 'https://www.olx.com.pk/item/oppo-f17-pro8128-iid-1034320813'
resp = requests.get(url)
# Convert the response text to HTML soup object
soup = bs.BeautifulSoup(resp.text, 'html.parser')
# Select the 7th script tag (that is where the data you need is present)
s = soup.findAll('script')[6]
# Extract the contents of script. This will be a string type.
f = s.contents[0]
# Find the index of substring "phoneNumber" - the data that you need.
idx = f.index('phoneNumber')
# Since you need the phone number, use string slicing and extract the data.
print(f[idx-1: idx + 28])
# Output
"phoneNumber":"+923077250739"
我可能只使用一个简单的正则表达式来定位 telephoneNumber
之后的“”内的字符串
import requests, re
r = requests.get('https://www.olx.com.pk/item/oppo-f17-pro8128-iid-1034320813')
print(re.search(r'phoneNumber":"(.*?)"', r.text).group(1))
代码
import requests
from bs4 import BeautifulSoup as bs
my_url='https://www.olx.com.pk/item/oppo-f17-pro8128-iid-1034320813'
with requests.session() as s:
r=s.get(my_url)
page_html=bs(r.content,'html.parser')
safe=page_html.findAll('script')
print("The Length if Script is {0}:".format(len(safe)))
for i in safe:
if "+92" in str(i):
print(i)
查询
我想使用 python 脚本获取 phone 实际存在于 windows.state 中的数字,但我不知道如何解析 window.state.Will 非常感谢如果你帮我解决这个问题。提前致谢!
正如我在评论中提到的,window.state
出现在第 7 个 <script>
标签内。
我提取了脚本标签的内容并对 phoneNumber
进行了字符串搜索,找到了它的索引并能够获取您需要的数据。
从 JSON 中提取数据会更容易,但数据不是 JSON 格式。
import bs4 as bs
import requests
url = 'https://www.olx.com.pk/item/oppo-f17-pro8128-iid-1034320813'
resp = requests.get(url)
# Convert the response text to HTML soup object
soup = bs.BeautifulSoup(resp.text, 'html.parser')
# Select the 7th script tag (that is where the data you need is present)
s = soup.findAll('script')[6]
# Extract the contents of script. This will be a string type.
f = s.contents[0]
# Find the index of substring "phoneNumber" - the data that you need.
idx = f.index('phoneNumber')
# Since you need the phone number, use string slicing and extract the data.
print(f[idx-1: idx + 28])
# Output
"phoneNumber":"+923077250739"
我可能只使用一个简单的正则表达式来定位 telephoneNumber
之后的“”内的字符串import requests, re
r = requests.get('https://www.olx.com.pk/item/oppo-f17-pro8128-iid-1034320813')
print(re.search(r'phoneNumber":"(.*?)"', r.text).group(1))