Python Selenium:在抓取时循环遍历相同的元素
Python Selenium: Looping over the same element while scraping
上下文:
我正在尝试从 YouTube 频道抓取视频标题、观看次数以及上传时间等信息。它正在抓取相同的元素。
代码试验:
from selenium import webdriver
from selenium.webdriver.common.by import By
url = 'https://www.youtube.com/c/JohnWatsonRooney/videos?view=0&sort=p&flow=grid'
driver = webdriver.Chrome()
driver.get(url)
videos = driver.find_elements(by=By.CLASS_NAME, value='style-scope ytd-grid-video-renderer')
for video in videos:
title = driver.find_element(by=By.XPATH, value='.//*[@id="video-title"]').text
views = driver.find_element(by=By.XPATH, value='.//*[@id="metadata-line"]/span[1]').text
when = driver.find_element(by=By.XPATH, value='.//*[@id="metadata-line"]/span[2]').text
print(f"""Video Title: {title}\nViews: {views}\nUploaded: {when}\n -----------""")
输出
Video Title: Scrapy for Beginners - A Complete How To Example Web Scraping Project
Views: 104K views
Uploaded: 1 year ago
Video Title: Scrapy for Beginners - A Complete How To Example Web Scraping Project
Views: 104K views
Uploaded: 1 year ago
Video Title: Scrapy for Beginners - A Complete How To Example Web Scraping Project
Views: 104K views
Uploaded: 1 year ago..
打印 title、views 和 when 文本 website you need to for the and you can use the following :
代码块:
driver.get("https://www.youtube.com/c/JohnWatsonRooney/videos")
titles = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a#video-title")))]
views = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div#metadata-line > span:first-child")))]
when = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div#metadata-line > span:nth-child(2)")))]
for i, j, k in zip(titles, views, when):
print(f"{i} had {j} since posted {k}")
控制台输出:
The Python Package I Wish I'd Learned Earlier had 4.5K views since posted 4 days ago
Rotate User Agents in Scrapy using custom Middleware had 1.2K views since posted 11 days ago
GO for Beginners - Web Scraping with Golang Tutorial had 2K views since posted 13 days ago
How I Scraped This HTML Table to a Python Dictionary had 3.6K views since posted 3 weeks ago
Python TYPE HINTS Explained with Examples had 2.3K views since posted 3 weeks ago
Use THIS Algorithm To Find KEYWORDS in Text - A Short Python Project had 4.2K views since posted 1 month ago
SQLModel is the Pydantic inspired Python ORM we’ve been waiting for had 3.4K views since posted 1 month ago
How to use Enumerate in Python to have a Counter in your loops had 4.6K views since posted 1 month ago
How to HIDE Your API Keys in Python Projects had 6.6K views since posted 1 month ago
How to Make 2500 HTTP Requests in 2 Seconds with Async & Await had 4.8K views since posted 2 months ago
Are You Still Using Excel? AUTOMATE it with PYTHON had 10K views since posted 2 months ago
How To Parse Data from HTML Tables Using Requests-HTML had 3.5K views since posted 2 months ago
THIS is the most common ERROR when learning Web Scraping had 3.6K views since posted 2 months ago
Learn Web Scraping With Python: Full Project - HTML, Save to CSV, Pagination had 10K views since posted 3 months ago
Turn Websites into Real Time API's with ScrapyRT had 4.4K views since posted 3 months ago
HTTPX is the ASYNC Requests I was Looking For had 5.8K views since posted 3 months ago
Research Amazon Products by Extracting Review Data had 4K views since posted 4 months ago
Web Scraping 101 - My (in)complete guide, methods, tools, how to had 5.4K views since posted 4 months ago
How to Scrape JavaScript Websites with Scrapy and Playwright had 9.8K views since posted 5 months ago
Web Scraping with Node js? Python Expert Opinion and demo had 3.3K views since posted 5 months ago
Automate Buying online using Playwright’s Codegen feature had 4.6K views since posted 5 months ago
Login and Scrape Data with Playwright and Python had 16K views since posted 5 months ago
Parse HTML with BeautifulSoup AND Scrapy had 2.6K views since posted 5 months ago
HIDING Data with JavaScript? Web Scraping Obfuscation had 3.9K views since posted 5 months ago
Following LINKS Automatically with Scrapy CrawlSpider had 6.2K views since posted 6 months ago
Web Scraping Weather Data with Python had 13K views since posted 6 months ago
How I Try My CODE and Test My SELECTORS in SCRAPY had 1.8K views since posted 6 months ago
How To Handle Errors & Exceptions with Requests and Python had 2.8K views since posted 6 months ago
A Short and SIMPLE HTML Web Scraper in 6 lines of CODE had 1.9K views since posted 7 months ago
Failed Requests? Try this RETRY Decorator for your Web Scraper had 3.5K views since posted 7 months ago
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
上下文:
我正在尝试从 YouTube 频道抓取视频标题、观看次数以及上传时间等信息。它正在抓取相同的元素。
代码试验:
from selenium import webdriver
from selenium.webdriver.common.by import By
url = 'https://www.youtube.com/c/JohnWatsonRooney/videos?view=0&sort=p&flow=grid'
driver = webdriver.Chrome()
driver.get(url)
videos = driver.find_elements(by=By.CLASS_NAME, value='style-scope ytd-grid-video-renderer')
for video in videos:
title = driver.find_element(by=By.XPATH, value='.//*[@id="video-title"]').text
views = driver.find_element(by=By.XPATH, value='.//*[@id="metadata-line"]/span[1]').text
when = driver.find_element(by=By.XPATH, value='.//*[@id="metadata-line"]/span[2]').text
print(f"""Video Title: {title}\nViews: {views}\nUploaded: {when}\n -----------""")
输出
Video Title: Scrapy for Beginners - A Complete How To Example Web Scraping Project
Views: 104K views
Uploaded: 1 year ago
Video Title: Scrapy for Beginners - A Complete How To Example Web Scraping Project
Views: 104K views
Uploaded: 1 year ago
Video Title: Scrapy for Beginners - A Complete How To Example Web Scraping Project
Views: 104K views
Uploaded: 1 year ago..
打印 title、views 和 when 文本 website you need to
代码块:
driver.get("https://www.youtube.com/c/JohnWatsonRooney/videos") titles = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a#video-title")))] views = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div#metadata-line > span:first-child")))] when = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div#metadata-line > span:nth-child(2)")))] for i, j, k in zip(titles, views, when): print(f"{i} had {j} since posted {k}")
控制台输出:
The Python Package I Wish I'd Learned Earlier had 4.5K views since posted 4 days ago Rotate User Agents in Scrapy using custom Middleware had 1.2K views since posted 11 days ago GO for Beginners - Web Scraping with Golang Tutorial had 2K views since posted 13 days ago How I Scraped This HTML Table to a Python Dictionary had 3.6K views since posted 3 weeks ago Python TYPE HINTS Explained with Examples had 2.3K views since posted 3 weeks ago Use THIS Algorithm To Find KEYWORDS in Text - A Short Python Project had 4.2K views since posted 1 month ago SQLModel is the Pydantic inspired Python ORM we’ve been waiting for had 3.4K views since posted 1 month ago How to use Enumerate in Python to have a Counter in your loops had 4.6K views since posted 1 month ago How to HIDE Your API Keys in Python Projects had 6.6K views since posted 1 month ago How to Make 2500 HTTP Requests in 2 Seconds with Async & Await had 4.8K views since posted 2 months ago Are You Still Using Excel? AUTOMATE it with PYTHON had 10K views since posted 2 months ago How To Parse Data from HTML Tables Using Requests-HTML had 3.5K views since posted 2 months ago THIS is the most common ERROR when learning Web Scraping had 3.6K views since posted 2 months ago Learn Web Scraping With Python: Full Project - HTML, Save to CSV, Pagination had 10K views since posted 3 months ago Turn Websites into Real Time API's with ScrapyRT had 4.4K views since posted 3 months ago HTTPX is the ASYNC Requests I was Looking For had 5.8K views since posted 3 months ago Research Amazon Products by Extracting Review Data had 4K views since posted 4 months ago Web Scraping 101 - My (in)complete guide, methods, tools, how to had 5.4K views since posted 4 months ago How to Scrape JavaScript Websites with Scrapy and Playwright had 9.8K views since posted 5 months ago Web Scraping with Node js? Python Expert Opinion and demo had 3.3K views since posted 5 months ago Automate Buying online using Playwright’s Codegen feature had 4.6K views since posted 5 months ago Login and Scrape Data with Playwright and Python had 16K views since posted 5 months ago Parse HTML with BeautifulSoup AND Scrapy had 2.6K views since posted 5 months ago HIDING Data with JavaScript? Web Scraping Obfuscation had 3.9K views since posted 5 months ago Following LINKS Automatically with Scrapy CrawlSpider had 6.2K views since posted 6 months ago Web Scraping Weather Data with Python had 13K views since posted 6 months ago How I Try My CODE and Test My SELECTORS in SCRAPY had 1.8K views since posted 6 months ago How To Handle Errors & Exceptions with Requests and Python had 2.8K views since posted 6 months ago A Short and SIMPLE HTML Web Scraper in 6 lines of CODE had 1.9K views since posted 7 months ago Failed Requests? Try this RETRY Decorator for your Web Scraper had 3.5K views since posted 7 months ago
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC