Python：如何提取"data-bind"html个元素？

Question

我正在尝试从网站中提取数据。该元素是隐藏的。当我尝试 "view source" 时，没有显示 header 文本。

<h4 data-bind="Text: Name"></h4>

但是当我尝试检查时，有文字可见。

<h4 data-bind="Text: Name">STM1F-1S-HC</h4>

使用的代码是：

def getlink(link):
    try:
        f = urllib.request.urlopen(link)
        soup0 = BeautifulSoup(f)
    except Exception as e:
        print (e)
        soup0 = 'abc'
    for row2 in soup0.findAll("h4",{"data-bind":"text: Name"}):
        Name = row2.text
        print(Name)

#code to find all links to the products for further processing.
i=1
global i
for row in r1.findAll('a', { "class" : "col-xs-12 col-sm-6" }):
    link = 'https://www.truemfg.com/USA-Foodservice/'+row['href']
    print(link)
    getlink(link)
print(productcount)

输出为：

https://www.truemfg.com/USA-Foodservice/Products/Traditional-Reach-Ins
C:\Users\Santosh\Anaconda3\lib\site-packages\bs4\__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 193 of the file C:\Users\Santosh\Anaconda3\lib\runpy.py. To get rid of this warning, change code that looks like this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))

https://www.truemfg.com/USA-Foodservice/Products/Specification-Series

https://www.truemfg.com/USA-Foodservice/Products/Food-Prep-Tables

https://www.truemfg.com/USA-Foodservice/Products/Undercounters

https://www.truemfg.com/USA-Foodservice/Products/Worktops

https://www.truemfg.com/USA-Foodservice/Products/Chef-Bases

https://www.truemfg.com/USA-Foodservice/Products/Milk-Coolers

https://www.truemfg.com/USA-Foodservice/Products/Glass-Door-Merchandisers

https://www.truemfg.com/USA-Foodservice/Products/Air-Curtains

https://www.truemfg.com/USA-Foodservice/Products/Display-Cases

https://www.truemfg.com/USA-Foodservice/Products/Underbar-Refrigeration

我们发现没有打印名字。

有人可以让我知道打印名称的解决方案吗？

谢谢，桑托什

Answer 1

XHR 动态生成的必需内容。你可以试试下面的代码直接请求数据，避免解析 HTML:

import requests

url = 'https://prodtrueservices.azurewebsites.net/api/products/productline/403/1?skip=0&take=200&unit=Imperial'
r = requests.get(url)
counter = 0

while True:
    try:
        print(r.json()['Products'][counter]['Name'])
            counter += 1
    except IndexError:
        break

这应该允许您获取所有名称

Python：如何提取"data-bind"html个元素？

Python: how to extract "data-bind" html elements?

html

python

data-binding

web-scraping

data-extraction