如何用python抓取yahoo财务数据的季度和具体日期?
How to grab quarterly and specific the date of yahoo financial data with python?
我可以通过下面的代码从这个link下载年度的数据,但是和网站上显示的不一样,因为是6月份的数据:
现在我有两个问题:
- 如何指定日期使年度数据与下图相同(9 月而不是 6 月,如红色矩形所示)?
- 如橙色矩形所示按季度单击,link 不会更改。如何抓取季度数据?
谢谢。
很好奇,为什么先把html写入文件,然后用pandas读取呢? Pandas可以直接接受html请求:
import pandas as pd
symbol = 'AAPL'
url = 'https://finance.yahoo.com/quote/%s/financials?p=%s' %(symbol, symbol)
dfs = pd.read_html(url)
print(dfs[0])
其次,不确定为什么你的日期会突然出现。按照我上面的方式做就是显示九月。
print(dfs[0])
0 ... 4
0 Revenue ... 9/26/2015
1 Total Revenue ... 233715000
2 Cost of Revenue ... 140089000
3 Gross Profit ... 93626000
4 Operating Expenses ... Operating Expenses
5 Research Development ... 8067000
6 Selling General and Administrative ... 14329000
7 Non Recurring ... -
8 Others ... -
9 Total Operating Expenses ... 162485000
10 Operating Income or Loss ... 71230000
11 Income from Continuing Operations ... Income from Continuing Operations
12 Total Other Income/Expenses Net ... 1285000
13 Earnings Before Interest and Taxes ... 71230000
14 Interest Expense ... -733000
15 Income Before Tax ... 72515000
16 Income Tax Expense ... 19121000
17 Minority Interest ... -
18 Net Income From Continuing Ops ... 53394000
19 Non-recurring Events ... Non-recurring Events
20 Discontinued Operations ... -
21 Extraordinary Items ... -
22 Effect Of Accounting Changes ... -
23 Other Items ... -
24 Net Income ... Net Income
25 Net Income ... 53394000
26 Preferred Stock And Other Adjustments ... -
27 Net Income Applicable To Common Shares ... 53394000
[28 rows x 5 columns]
对于第二部分,您可以尝试通过以下几种方式查找数据 1:
1) 检查 XHR 请求并通过将参数包含到生成该数据的请求 url 中获取您想要的数据,并且可以 return 以 json 格式(这当我寻找时,我无法立即找到,所以转到下一个选项)
2) 搜索 <script>
标签,因为 json 格式有时会在这些标签内(我没有很彻底地搜索,并且认为 Selenium 只是一个直接的因为 pandas 可以读入 tables)
3)用selenium模拟打开浏览器,得到table,点击"Quarterly",得到table
我选择了选项 3:
from selenium import webdriver
import pandas as pd
symbol = 'AAPL'
url = 'https://finance.yahoo.com/quote/%s/financials?p=%s' %(symbol, symbol)
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(url)
# Get Table shown in browser
dfs_annual = pd.read_html(driver.page_source)
print(dfs_annual[0])
# Click "Quarterly"
driver.find_element_by_xpath("//span[text()='Quarterly']").click()
# Get Table shown in browser
dfs_quarter = pd.read_html(driver.page_source)
print(dfs_quarter[0])
driver.close()
我可以通过下面的代码从这个link下载年度的数据,但是和网站上显示的不一样,因为是6月份的数据:
现在我有两个问题:
- 如何指定日期使年度数据与下图相同(9 月而不是 6 月,如红色矩形所示)?
- 如橙色矩形所示按季度单击,link 不会更改。如何抓取季度数据?
谢谢。
很好奇,为什么先把html写入文件,然后用pandas读取呢? Pandas可以直接接受html请求:
import pandas as pd
symbol = 'AAPL'
url = 'https://finance.yahoo.com/quote/%s/financials?p=%s' %(symbol, symbol)
dfs = pd.read_html(url)
print(dfs[0])
其次,不确定为什么你的日期会突然出现。按照我上面的方式做就是显示九月。
print(dfs[0])
0 ... 4
0 Revenue ... 9/26/2015
1 Total Revenue ... 233715000
2 Cost of Revenue ... 140089000
3 Gross Profit ... 93626000
4 Operating Expenses ... Operating Expenses
5 Research Development ... 8067000
6 Selling General and Administrative ... 14329000
7 Non Recurring ... -
8 Others ... -
9 Total Operating Expenses ... 162485000
10 Operating Income or Loss ... 71230000
11 Income from Continuing Operations ... Income from Continuing Operations
12 Total Other Income/Expenses Net ... 1285000
13 Earnings Before Interest and Taxes ... 71230000
14 Interest Expense ... -733000
15 Income Before Tax ... 72515000
16 Income Tax Expense ... 19121000
17 Minority Interest ... -
18 Net Income From Continuing Ops ... 53394000
19 Non-recurring Events ... Non-recurring Events
20 Discontinued Operations ... -
21 Extraordinary Items ... -
22 Effect Of Accounting Changes ... -
23 Other Items ... -
24 Net Income ... Net Income
25 Net Income ... 53394000
26 Preferred Stock And Other Adjustments ... -
27 Net Income Applicable To Common Shares ... 53394000
[28 rows x 5 columns]
对于第二部分,您可以尝试通过以下几种方式查找数据 1:
1) 检查 XHR 请求并通过将参数包含到生成该数据的请求 url 中获取您想要的数据,并且可以 return 以 json 格式(这当我寻找时,我无法立即找到,所以转到下一个选项)
2) 搜索 <script>
标签,因为 json 格式有时会在这些标签内(我没有很彻底地搜索,并且认为 Selenium 只是一个直接的因为 pandas 可以读入 tables)
3)用selenium模拟打开浏览器,得到table,点击"Quarterly",得到table
我选择了选项 3:
from selenium import webdriver
import pandas as pd
symbol = 'AAPL'
url = 'https://finance.yahoo.com/quote/%s/financials?p=%s' %(symbol, symbol)
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(url)
# Get Table shown in browser
dfs_annual = pd.read_html(driver.page_source)
print(dfs_annual[0])
# Click "Quarterly"
driver.find_element_by_xpath("//span[text()='Quarterly']").click()
# Get Table shown in browser
dfs_quarter = pd.read_html(driver.page_source)
print(dfs_quarter[0])
driver.close()