如何使用 Python 和 Selenium 遍历网站主体

Question

首先，我的 python 知识非常简陋，如果我问的真的很愚蠢，我深表歉意，但这里是。

我正在尝试使用 selenium 来阅读看板（特别是 4chan 上的 /biz/ 的目录）来跟踪我投资的项目的关键字，并在有讨论我的项目之一的线程时通知我.

到目前为止，我已经设法打开页面并找到我想要搜索的元素，方法是：

from selenium import webdriver

PATH = "C:\Program Files (x86)\chromedriver.exe"
driver  = webdriver.Chrome(PATH)

driver.get('https://boards.4channel.org/biz/catalog')

threads = driver.find_element_by_id('threads').text

print(threads)
driver.quit()

这成功地将所有线程打印为文本，但现在我想遍历它们并且只 return 包含关键字“NFY”和“CORX”的行。我一直在用关键字“DOGE”进行测试，因为我的很少被提及。遍历此文本且仅 return 包含我的关键字的行的最佳方法是什么？

Answer 1

如果您想 return 线程，这应该可以。

threads = driver.find_elements_by_xpath("Path to individual threads")

searchText = ["DOGE", "NFY", "CORX"]

for t in searchText.lower():
    for i in range(len(threads)):
        if t in threads[i].text.lower():
            print(f"Thread: {threads[i].text}")

如何使用 Python 和 Selenium 遍历网站主体

How to loop through the body of a website using Python and Selenium

python

iteration

selenium

screen-scraping