Selenium中的find_elements()函数消耗大量RAM

Question

情况描述： 它是一个在框架中滚动以提取信息的脚本。

<ul>

<li> </li>
<li> </li>
<li> </li>
<li> </li>
<li> </li>
...
</ul>

列表长度约30个项目，滚动时，没有添加新项目<li> </li>，仅更新。 DOM的结构不增加。

问题说明： 当脚本滚动时，它必须为每次迭代提取 <li> </li> 的所有元素，因为它们会被更新。

这里是滚动和提取元素的逻辑。我使用的代码：

SCROLL_PAUSE_TIME = 5

# Get scroll height
last_height = driver.execute_script("return document.querySelector('div[data-tid=\"pane-list-viewport\"]').scrollHeight;")

all_msgs_loaded = False

while not all_msgs_loaded:

    li_elements: List[WebElement] = self._driver.find_elements(By.XPATH, "//li[@data-tid='pane-item']")

    driver.execute_script("document.querySelector('li[data-tid=\"pane-item\"]').scrollIntoView();")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.querySelector('div[data-tid=\"pane-list-viewport\"]').scrollHeight;")
    if new_height == last_height:
        all_msgs_loaded = True
    last_height = new_height

每次迭代 li_elements 接收大约 30 个 WebElements。如果我在带有 find_elements 的行上发表评论，该脚本可以运行数小时而不会增加 RAM 消耗。我提到我在运行时没有保存任何东西，我没有在其他地方增加消耗。

Another way I used to get li_elements is through self._driwer.execute_script ()

示例：

li_elements = (self._driver.execute_script(
                 "return document.querySelectorAll('li[data-tid=\"pane-item\"]');",
                 WebDriverWait(self._edge_driver, 20).until(
                     EC.visibility_of_element_located((By.XPATH, "//li[@data-tid='pane-item']")))

通过这两种方法，我得到了相同的结果，但 RAM 的增加是相同的。 RAM 无限增长，直到 TaskManager 出于安全考虑自行销毁进程。

我分析了这些函数的内部结构，但没有找到可以加载RAM的东西。另一种模态是 find_elements_by_css_selector ()，但在它内部称为 find_elements ().

我也尝试了与 sleep() 的不同组合，但没有任何帮助，RAM 并没有减少。

你能给我解释一下现实中发生的事情吗，我不明白为什么内存消耗会增加。

你能告诉我是否有另一种不消耗 RAM 的提取元素的方法吗？

Answer 1

无论如何 method of shouldn't be consuming so much of RAM. Most possibly it's the Browsing Context e.g. google-chrome which consumes more RAM while you incase the <li> items gets updated through JavaScript or AJAX。

在 DOM Tree it would be difficult to predict the actual reason or any remediation. However, a similar 中没有任何可见性，建议使用 time.sleep(n)

的一些等待时间

Answer 2

尝试只获取您需要的元素而不是完整元素：

lis = driver.execute_script("""
  return [...document.querySelectorAll('li[data-tid="pane-item"]')].map(li => li.innerText)
""")

我不知道你在用它们做什么，但如果你向一个大数组中添加元素，并且元素足够多，你就会达到 RAM 限制

Selenium中的find_elements()函数消耗大量RAM

The find_elements () function in Selenium consumes a lot of RAM

python

ram

selenium

memory-management