如何使用 Python Playwright 通过向其提供 URL 列表来打开新选项卡?
How to open a new tab using Python Playwright by feeding it a list of URLs?
根据Playwright文档,在浏览器中打开新标签页的方式如scrap_post_info()
函数所示?但是,它没有这样做。
我目前正在尝试做的是遍历 posts
列表变量中的每个 URL,然后打开 link 或 URL用于废弃 post 详细信息的新选项卡。完成抓取 post 后,该选项卡将关闭并继续在新选项卡中打开下一个 link 以再次抓取 post 详细信息,直到到达最后一个 [=24] =] 在 posts
列表变量中。
# Loop through each URL from the `posts` list variable that contains many posts' URLs
for post in posts:
scrap_post_info(context, post)
def scrap_post_info(context, post):
with context.expect_page() as new_page_info:
page.click('a[target="_blank"]') # Opens a new tab
new_page = new_page_info.value
new_page.wait_for_load_state()
print(new_page.title())
为我的项目做类似的事情,这就是我会做的。
from playwright.sync_api import sync_playwright
posts = ['https://playwright.dev/','https://playwright.dev/python/',]
def scrap_post_info(context, post):
page = context.new_page()
page.goto(post)
print(page.title())
# do whatever scraping you need to
page.close()
with sync_playwright() as p:
browser = p.chromium.launch()
context = browser.new_context()
for post in posts:
scrap_post_info(context, post)
# some time delay
browser.close()
剧作家文档中的 code snippet 更多是关于在现有页面上单击 link 后打开新页面。由于您已经准备好 url,您可以逐页访问每个页面,然后进行抓取。
根据Playwright文档,在浏览器中打开新标签页的方式如scrap_post_info()
函数所示?但是,它没有这样做。
我目前正在尝试做的是遍历 posts
列表变量中的每个 URL,然后打开 link 或 URL用于废弃 post 详细信息的新选项卡。完成抓取 post 后,该选项卡将关闭并继续在新选项卡中打开下一个 link 以再次抓取 post 详细信息,直到到达最后一个 [=24] =] 在 posts
列表变量中。
# Loop through each URL from the `posts` list variable that contains many posts' URLs
for post in posts:
scrap_post_info(context, post)
def scrap_post_info(context, post):
with context.expect_page() as new_page_info:
page.click('a[target="_blank"]') # Opens a new tab
new_page = new_page_info.value
new_page.wait_for_load_state()
print(new_page.title())
为我的项目做类似的事情,这就是我会做的。
from playwright.sync_api import sync_playwright
posts = ['https://playwright.dev/','https://playwright.dev/python/',]
def scrap_post_info(context, post):
page = context.new_page()
page.goto(post)
print(page.title())
# do whatever scraping you need to
page.close()
with sync_playwright() as p:
browser = p.chromium.launch()
context = browser.new_context()
for post in posts:
scrap_post_info(context, post)
# some time delay
browser.close()
剧作家文档中的 code snippet 更多是关于在现有页面上单击 link 后打开新页面。由于您已经准备好 url,您可以逐页访问每个页面,然后进行抓取。