我们如何遍历项目，并从网站批量下载文本文件？

Question

我正在尝试弄清楚如何循环遍历 ListBox 中的项目并下载和批量下载文本文件。

这是我正在查看的link。

https://cdr.ffiec.gov/public/PWS/DownloadBulkData.aspx

我想要select这个产品。

'Call Reports -- Balance Sheet, Income Statement, Past Due -- Four Periods'

然后循环遍历 2020-2012 年，将这些文件批量下载到我的本地硬盘。

我在浏览器中按了 F11，很容易找到按钮和 'DatesDropDownList'，但我没有看到 link 指向对所有下载有用的文本文件的任何 URL。你需要硒吗？

作为 select 在列表中添加内容和单击按钮的替代方法，是否有某种 Web 服务可以简化此过程？

Answer 1

我会使用硒。这是在 Python 中实现您在 Web 浏览器中手动执行的操作的直接途径。

这是从您提供的内容中摘取的示例。

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

# using Google Chrome, can use the browser of your choice
driver = webdriver.Chrome('PATH/TO/chromedriver.exe')

url = 'https://cdr.ffiec.gov/public/PWS/DownloadBulkData.aspx'
driver.get(url)

等待加载可用产品，然后 select 值。

path = "//select[@id='ListBox1']"
products = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located(
        (By.XPATH, path)
    )
)

select = 'Call Reports -- Balance Sheet, Income Statement, Past Due -- Four Periods'
driver.find_element_by_xpath(path+"/option[text()='"+select+"']").click()

等待多年加载。获取年份列表。 Select 以其中之一为例。

path = "//select[@id='DatesDropDownList']"
dropdown = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located(
        (By.XPATH, path)
    )
)

dates = driver.find_elements_by_xpath(path+'/option')

# an example, you can loop through dates
driver.find_element_by_xpath(path+"/option[text()='"+dates[10].text+"']").click()

从这里您可以解压缩文件，将它们加载到 Pandas DataFrame 中，然后存储在 Excel 文件、数据库等中。

我们如何遍历项目，并从网站批量下载文本文件？

How can we loop through items, and download text files from a web site in bulk?

python

selenium

web-services

python-3.x