无法缩小网络抓取工具中的搜索条件以搜索 "job titles" 并计算每个搜索条件的频率

Question

对于我所做的一些工作，我需要收集有关职位名称的数据以及它们在搜索结果中出现的频率，因此我决定招募 Python 来帮助我完成这项工作。唯一的问题是我似乎无法弄清楚为什么我发现的这段代码片段没有提供我需要的正确信息。这是我目前所拥有的：

import requests
from bs4 import BeautifulSoup
from collections import Counter
from string import punctuation

# We get the url
r = requests.get("https://www.usajobs.gov/Search/Results?j=0602&d=VA&p=1")
soup = BeautifulSoup(r.content, "html.parser")


# We get the words within divs
text_div = (''.join(s.findAll(text=True))for s in soup.findAll('div'))
c_div = Counter((x.rstrip(punctuation).lower() for y in text_div for x in y.split()))


total = c_div
print(total)

我知道这部分涉及检查代码，但我不知道我需要输入什么才能让抓取工具缩小到这些标题：

<a id="usajobs-search-result-0" class="usajobs-search-result--core__title search-joa-link" href="/GetJob/ViewDetails/568337700" itemprop="title" data-document-id="568337700">

非常感谢任何帮助

Answer 1

数据通过发送 POST 请求动态加载：

https://www.usajobs.gov/Search/ExecuteSearch

查看此示例以获得正确的职位。（您可以更改 page 键以指定页码）。

import requests


data = {
    "JobTitle": [],
    "GradeBucket": [],
    "JobCategoryCode": ["0602"],
    "JobCategoryFamily": [],
    "LocationName": [],
    "PostingChannel": [],
    "Department": ["VA"],
    "Agency": [],
    "PositionOfferingTypeCode": [],
    "TravelPercentage": [],
    "PositionScheduleTypeCode": [],
    "SecurityClearanceRequired": [],
    "PositionSensitivity": [],
    "ShowAllFilters": [],
    "HiringPath": [],
    "SocTitle": [],
    "MCOTags": [],
    "CyberWorkRole": [],
    "CyberWorkGrouping": [],
    "Page": "1",  # <-- Change page number here
    "UniqueSearchID": "9d417c5e-adc2-469c-af1d-e786cc41bc97",
    "IsAuthenticated": "false",
}


response = requests.post(
    "https://www.usajobs.gov/Search/ExecuteSearch", json=data
).json()

job_titles = [job["Title"] for job in response["Jobs"]]
print(job_titles)

输出：

['Psychiatrist - OCA', 'Physician - Electromyography (Temporary)', 'Physician Owensboro CBOC PC', 'Physician-Primary Care', 'OPHTHALMOLOGIST', 'UROLOGIST', 'PHYSICIAN (OTOLARYNGOLOGIST', 'Physician-Hospitalist', 'Physician - Hemotology/Oncology', 'Academic Gastroenterologist', 'Physician - Gastroenterologist', 'Physician - Orthopedic Surgeon', 'Physician (Internal Medicine or Family Practice)', 'Physician (Regular Ft)- Hematologist/Oncologist', 'Physician- Hematologist/Oncologist', 'Physician - Diagnostic Radiologist', 'Physician (Psychiatrist)', 'Physician (Endocrinologist)', 'Physician (Cardiologist)', 'Physician (Neurologist)', 'Physician (Chief Hospitalist)', 'Physician (Hospitalist)', 'Physician (Medical Director of Extended Care/Chief of Geriatrics)', 'Physician (Primary Care)', 'Physician (Hematologist/Oncologist)']

无法缩小网络抓取工具中的搜索条件以搜索 "job titles" 并计算每个搜索条件的频率

Can't narrow down the search criteria in a web scraper to search "job titles" and count the frequency of each one

python

parsing

beautifulsoup

python-requests