Python:如何提取"data-bind"html个元素?
Python: how to extract "data-bind" html elements?
我正在尝试从网站中提取数据。该元素是隐藏的。当我尝试 "view source" 时,没有显示 header 文本。
<h4 data-bind="Text: Name"></h4>
但是当我尝试检查时,有文字可见。
<h4 data-bind="Text: Name">STM1F-1S-HC</h4>
使用的代码是:
def getlink(link):
try:
f = urllib.request.urlopen(link)
soup0 = BeautifulSoup(f)
except Exception as e:
print (e)
soup0 = 'abc'
for row2 in soup0.findAll("h4",{"data-bind":"text: Name"}):
Name = row2.text
print(Name)
#code to find all links to the products for further processing.
i=1
global i
for row in r1.findAll('a', { "class" : "col-xs-12 col-sm-6" }):
link = 'https://www.truemfg.com/USA-Foodservice/'+row['href']
print(link)
getlink(link)
print(productcount)
输出为:
https://www.truemfg.com/USA-Foodservice/Products/Traditional-Reach-Ins
C:\Users\Santosh\Anaconda3\lib\site-packages\bs4\__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
The code that caused this warning is on line 193 of the file C:\Users\Santosh\Anaconda3\lib\runpy.py. To get rid of this warning, change code that looks like this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "lxml")
markup_type=markup_type))
https://www.truemfg.com/USA-Foodservice/Products/Specification-Series
https://www.truemfg.com/USA-Foodservice/Products/Food-Prep-Tables
https://www.truemfg.com/USA-Foodservice/Products/Undercounters
https://www.truemfg.com/USA-Foodservice/Products/Worktops
https://www.truemfg.com/USA-Foodservice/Products/Chef-Bases
https://www.truemfg.com/USA-Foodservice/Products/Milk-Coolers
https://www.truemfg.com/USA-Foodservice/Products/Glass-Door-Merchandisers
https://www.truemfg.com/USA-Foodservice/Products/Air-Curtains
https://www.truemfg.com/USA-Foodservice/Products/Display-Cases
https://www.truemfg.com/USA-Foodservice/Products/Underbar-Refrigeration
我们发现没有打印名字。
有人可以让我知道打印名称的解决方案吗?
谢谢,
桑托什
XHR 动态生成的必需内容。你可以试试下面的代码直接请求数据,避免解析 HTML
:
import requests
url = 'https://prodtrueservices.azurewebsites.net/api/products/productline/403/1?skip=0&take=200&unit=Imperial'
r = requests.get(url)
counter = 0
while True:
try:
print(r.json()['Products'][counter]['Name'])
counter += 1
except IndexError:
break
这应该允许您获取所有名称
我正在尝试从网站中提取数据。该元素是隐藏的。当我尝试 "view source" 时,没有显示 header 文本。
<h4 data-bind="Text: Name"></h4>
但是当我尝试检查时,有文字可见。
<h4 data-bind="Text: Name">STM1F-1S-HC</h4>
使用的代码是:
def getlink(link):
try:
f = urllib.request.urlopen(link)
soup0 = BeautifulSoup(f)
except Exception as e:
print (e)
soup0 = 'abc'
for row2 in soup0.findAll("h4",{"data-bind":"text: Name"}):
Name = row2.text
print(Name)
#code to find all links to the products for further processing.
i=1
global i
for row in r1.findAll('a', { "class" : "col-xs-12 col-sm-6" }):
link = 'https://www.truemfg.com/USA-Foodservice/'+row['href']
print(link)
getlink(link)
print(productcount)
输出为:
https://www.truemfg.com/USA-Foodservice/Products/Traditional-Reach-Ins
C:\Users\Santosh\Anaconda3\lib\site-packages\bs4\__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
The code that caused this warning is on line 193 of the file C:\Users\Santosh\Anaconda3\lib\runpy.py. To get rid of this warning, change code that looks like this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "lxml")
markup_type=markup_type))
https://www.truemfg.com/USA-Foodservice/Products/Specification-Series
https://www.truemfg.com/USA-Foodservice/Products/Food-Prep-Tables
https://www.truemfg.com/USA-Foodservice/Products/Undercounters
https://www.truemfg.com/USA-Foodservice/Products/Worktops
https://www.truemfg.com/USA-Foodservice/Products/Chef-Bases
https://www.truemfg.com/USA-Foodservice/Products/Milk-Coolers
https://www.truemfg.com/USA-Foodservice/Products/Glass-Door-Merchandisers
https://www.truemfg.com/USA-Foodservice/Products/Air-Curtains
https://www.truemfg.com/USA-Foodservice/Products/Display-Cases
https://www.truemfg.com/USA-Foodservice/Products/Underbar-Refrigeration
我们发现没有打印名字。
有人可以让我知道打印名称的解决方案吗?
谢谢, 桑托什
XHR 动态生成的必需内容。你可以试试下面的代码直接请求数据,避免解析 HTML
:
import requests
url = 'https://prodtrueservices.azurewebsites.net/api/products/productline/403/1?skip=0&take=200&unit=Imperial'
r = requests.get(url)
counter = 0
while True:
try:
print(r.json()['Products'][counter]['Name'])
counter += 1
except IndexError:
break
这应该允许您获取所有名称