如何使用 BeautifulSoup 从 HTML 代码中获取文本?

How can I get the text from this HTML code with BeautifulSoup?

我有以下代码:

from bs4 import BeautifulSoup
#https://nl.indeed.com/vacatures?q=python%20developer%20%E2%82%AC30.000&vjk=7074ca9832f39591
with open("INDEED.html", encoding="utf8") as f:
    source = BeautifulSoup(f, "html.parser")
box1 = source.find_all("div", class_="job_seen_beacon")
for box2 in box1:
    title = box2.find("h2", class_="jobTitle")
    company = box2.find("span", class_="companyName")
    salary = box2.find("div", class_="metadata salary-snippet-container")
    print(title.text)
    print(company.text)
    print(salary)

我可以获取 [title] 和 [company] 信息作为文本,但是我不能对 [salary] 做同样的事情。 如何从这个 code/content 打印出 [salary] 作为文本? (工资信息位于代码末尾)

Full Stack Software Developer
Hello Energy
<div class="metadata salary-snippet-container"><div class="attribute_snippet"><svg aria-hidden="true" aria-label="Salary" fill="none" role="presentation" viewbox="0 0 16 13" xmlns="http://www.w3.org/2000/svg"><defs></defs><path clip-rule="evenodd" d="M2.45168 6.10292c-.30177-.125-.62509-.18964-.95168-.1903V4.08678c.32693-.00053.6506-.06518.95267-.1903.30331-.12564.57891-.30979.81105-.54193.23215-.23215.4163-.50775.54194-.81106.12524-.30237.18989-.62638.19029-.95365H9.0902c0 .3283.06466.65339.1903.9567.12564.30331.30978.57891.54193.81106.23217.23215.50777.41629.81107.54193.3032.12558.6281.19024.9562.1903v1.83556c-.3242.00155-.6451.06616-.9448.19028-.3033.12563-.5789.30978-.81102.54193-.23215.23214-.4163.50774-.54193.81106-.12332.2977-.18789.61638-.19024.93849H3.99496c-.00071-.32645-.06535-.64961-.19029-.95124-.12564-.30332-.30979-.57891-.54193-.81106-.23215-.23215-.50775-.4163-.81106-.54193zM0 .589843C0 .313701.223858.0898438.5.0898438h12.0897c.2762 0 .5.2238572.5.5000002V9.40715c0 .27614-.2238.5-.5.5H.5c-.276143 0-.5-.22386-.5-.5V.589843zM6.54427 6.99849c1.10457 0 2-.89543 2-2s-.89543-2-2-2-2 .89543-2 2 .89543 2 2 2zm8.05523-2.69917v7.10958H2.75977c-.27615 0-.5.2238-.5.5v.5c0 .2761.22385.5.5.5H15.422c.4419 0 .6775-.2211.6775-.6629V4.29932c0-.27615-.2239-.5-.5-.5h-.5c-.2761 0-.5.22385-.5.5z" fill="#595959" fill-rule="evenodd"></path></svg>€4.000 - €5.000 per maand</div></div>

您可以使用 get_text() 方法或 css 选择器

print(salary.get_text(strip=True))

#OR 

salary = box2.select_one(".metadata.salary-snippet-containe div")
salary=salary.text if salary else None

并非所有条目都有工资条目,您应该对此进行测试。例如:

from bs4 import BeautifulSoup
import requests

req = requests.get("https://nl.indeed.com/vacatures?q=python%20developer%20%E2%82%AC30.000&vjk=7074ca9832f39591")
soup = BeautifulSoup(req.content, "html.parser")

for div_job_seen in soup.find_all("div", class_="job_seen_beacon"):
    title = div_job_seen.find("h2", class_="jobTitle")
    company = div_job_seen.find("span", class_="companyName")
    salary = div_job_seen.find("div", class_="salary-snippet")
    
    print(title.text)
    print(company.text)
    
    if salary:
        print(salary.text)
        
    print('---------')

会给你以下条目:

Python Developer
xxllnc
---------
Senior Python Developer
Shopping Minds
---------
Senior python developer
Movares
€3.500 - €5.000 per maand
---------
Python Developer
Zaaksysteem.nl
---------
Backend Software Developer
Stream
€28 per uur
---------
Software Developer
Kasto Service
---------
Software Developer
NKI-AVL
€36 per uur
---------
Python Developer Crypto
Search X Recruitment
€90.000 - €110.000 per jaar
---------
Python Developer
The NextGen
---------
Python Developer
Techonomy
---------
Python Developer
Veneficus
€3.500 per maand
---------
Senior Python Software Developer
Flora Logistics
---------
Python Developer
Python People
---------
nieuwPython Ontwikkelaar
BeNext
€2.500 per maand
---------
Python developer
Prosafco