如何使用 python 中的 pandas 仅从具有两个数据帧的网页中 select 第二个数据帧？

Question

我试图在此 url 上仅使用第二个数据帧，但我无法弄清楚如何指定仅获取一个数据帧。它打印出图表的数据框和 table，但我只想打印出 table。

`import pandas as pd  
 import urllib.request

page = pd.read_html('https://www.google.com/finance/historical?q=a&startdate=Jan%201%2C%202000&enddate=Feb%2028%2C%202017&num=200&ei=_nm3WKGHCIf7jAG74ar4Cw&start=200', header=0)

for df in page:
    print(df)`

Answer 1

你要找的table有一个class属性gf-table，传给pd.read_html，那么它只会读取第二个table :

page = pd.read_html('https://www.google.com/finance/historical?q=a&startdate=Jan%201%2C%202000&enddate=Feb%2028%2C%202017&num=200&ei=_nm3WKGHCIf7jAG74ar4Cw&start=200', 
                    attrs = {'class': 'gf-table'},
                    header=0)

page

#             Date   Open   High    Low  Close   Volume
#0    May 12, 2016  42.59  42.94  42.42  42.73  2224506
#1    May 11, 2016  42.19  43.20  42.12  42.46  3325515
#2    May 10, 2016  41.50  42.00  41.35  42.00  2094305
#3     May 9, 2016  41.51  41.78  41.29  41.33  1741539
#4     May 6, 2016  40.86  41.62  40.72  41.43  1403476
#5     May 5, 2016  40.64  41.03  40.51  40.96  1083956
#...

Answer 2

使用数据-reader 是处理 google 雅虎财经数据的更好方法。也就是说@Psidom 提供了正确的答案。

这是一个示例，您如何使用 data-reader 直接将安捷伦股票数据作为 DataFrame 而不是数据帧列表。

import pandas as pd
import pandas_datareader.data as web
import datetime

start = datetime.datetime(2000, 1, 1)
end = datetime.datetime(2017, 2, 27)

data = web.DataReader('A', 'google', start, end)

data.head()

这将直接 return Pandas dataframe 允许你直接做类似的事情。

data.ix['2010-01-04']

例如从特定数据中获取数据

如何使用 python 中的 pandas 仅从具有两个数据帧的网页中 select 第二个数据帧？

How can I select the second dataframe only from a webpage with two dataframes, using pandas in python?

python

urllib2

pandas