将 NASDAQ HTML table 读取到 Dataframe

Read a NASDAQ HTML table to a Dataframe

我使用此代码从纳斯达克获得了最新的交易公司列表,但是我希望在数据框中显示结果,而不仅仅是包含我可能不需要的所有其他信息的列表。

有什么想法可以实现吗?谢谢

解析最新纳斯达克公司

    from bs4 import BeautifulSoup
    import requests

    r=requests.get('https://www.nasdaq.com/screening/companies-by 
    industry.aspx 
    exchange=NASDAQ&sortname=marketcap&sorttype=1&pagesize=4000')
    data = r.text
    soup = BeautifulSoup(data, "html.parser")
    table = soup.find( "table", {"id":"CompanylistResults"} )
    for row in table.findAll("tr"):
        for cell in row("td"):
            print (cell.get_text().strip())

看起来您正在寻找恰当命名的 read_html,但您需要尝试直到得到您想要的。你的情况:

>>> import pandas as pd
>>> df=pd.read_html(table.prettify(),flavor='bs4')[0]
>>> df.columns = [c.strip() for c in df.columns]

见下面的输出。

第一行是完成工作的内容,第二行只是去掉了 header 中所有那些讨厌的空格和新行。貌似有个隐藏的ADR TSO,好像没什么用,不知道是什么的可以扔掉。删除所有偶数行也可能有意义,因为它们只是奇数行的延续,据我所知是无用的链接。在一行中:

>>> df = df.drop(['ADR TSO'], axis=1) #Drop useless column
>>> df1= df[::2] #To get rid of even rows
>>> df2= df[~df['Name'].str.contains('Stock Quote')].head() #By string filtration if we are not sure about the odd/even thing

原始头部的输出仅供展示:

>>> df.head()
                                                Name Symbol Market Cap  \
0                                   Amazon.com, Inc.   AMZN   2.18B
1  AMZN Stock Quote  AMZN Ratings  AMZN Stock Report    NaN        NaN
2                              Microsoft Corporation   MSFT   9.12B
3  MSFT Stock Quote  MSFT Ratings  MSFT Stock Report    NaN        NaN
4                                      Alphabet Inc.  GOOGL    0.3B

   ADR TSO        Country IPO Year  \
0      NaN  United States     1997
1      NaN            NaN      NaN
2      NaN  United States     1986
3      NaN            NaN      NaN
4      NaN  United States      n/a

                                         Subsector
0                   Catalog/Specialty Distribution
1                                              NaN
2          Computer Software: Prepackaged Software
3                                              NaN
4  Computer Software: Programming, Data Processing

清理后的输出df.head():

                    Name Symbol Market Cap        Country IPO Year  \
0       Amazon.com, Inc.   AMZN   2.18B  United States     1997
2  Microsoft Corporation   MSFT   9.12B  United States     1986
4          Alphabet Inc.  GOOGL    0.3B  United States      n/a
6          Alphabet Inc.   GOOG   5.24B  United States     2004
8             Apple Inc.   AAPL    0.3B  United States     1980

                                         Subsector
0                   Catalog/Specialty Distribution
2          Computer Software: Prepackaged Software
4  Computer Software: Programming, Data Processing
6  Computer Software: Programming, Data Processing
8                           Computer Manufacturing