如何使用 BeautifulSoup 的 SEC 网站的 getText() 方法忽略 HTML 中嵌入的 jpeg 图像数据

How to ignore embedded jpeg image data in HTML using BeautifulSoup's getText() method for SEC website

我正在从 SEC 网站下载 8-K 文件。我正在尝试提取所有文本数据以进行情感分析,我遇到的问题是 getText() 也在获取所有嵌入的 jpeg 图像数据并将其视为文本。

这里是URL提交;将文件另存为 .html 将使您可以在浏览器中查看它。 https://www.sec.gov/Archives/edgar/data/2488/0000002488-18-000043.txt

到目前为止我想出的唯一解决方案是多通道解决方案,我必须 soup.findAll('html')。获取各种 html 块,每个块使用 soup.getText()。我必须迭代几次才能捕获 html。但是这种方法忽略了我需要的文件中的这些数据。要解决此问题,我首先必须在 运行 soup.getText() 之前将其拉出。我想知道是否有 simpler/cleaner 方法可以做到这一点。

谢谢!

<SEC-DOCUMENT>0000002488-18-000043.txt : 20180227
<SEC-HEADER>0000002488-18-000043.hdr.sgml : 20180227
<ACCEPTANCE-DATETIME>20180227163108
ACCESSION NUMBER:       0000002488-18-000043
CONFORMED SUBMISSION TYPE:  8-K
PUBLIC DOCUMENT COUNT:      19
CONFORMED PERIOD OF REPORT: 20180227
ITEM INFORMATION:       Results of Operations and Financial Condition
ITEM INFORMATION:       Regulation FD Disclosure
ITEM INFORMATION:       Financial Statements and Exhibits
FILED AS OF DATE:       20180227
DATE AS OF CHANGE:      20180227

FILER:

    COMPANY DATA:   
        COMPANY CONFORMED NAME:         ADVANCED MICRO DEVICES INC
        CENTRAL INDEX KEY:          0000002488
        STANDARD INDUSTRIAL CLASSIFICATION: SEMICONDUCTORS & RELATED DEVICES [3674]
        IRS NUMBER:             941692300
        STATE OF INCORPORATION:         DE
        FISCAL YEAR END:            1227

    FILING VALUES:
        FORM TYPE:      8-K
        SEC ACT:        1934 Act
        SEC FILE NUMBER:    001-07882
        FILM NUMBER:        18645526

    BUSINESS ADDRESS:   
        STREET 1:       2485 AUGUSTINE DRIVE
        CITY:           SANTA CLARA
        STATE:          CA
        ZIP:            95054
        BUSINESS PHONE:     (408) 749-4000

    MAIL ADDRESS:   
        STREET 1:       2485 AUGUSTINE DRIVE
        CITY:           SANTA CLARA
        STATE:          CA
        ZIP:            95054
</SEC-HEADER>
<DOCUMENT>
<TYPE>8-K
<SEQUENCE>1
<FILENAME>a6form8-kasc606disclosurev.htm
<DESCRIPTION>8-K
<TEXT>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
import requests
from bs4 import BeautifulSoup

r = requests.get(
    'https://www.sec.gov/Archives/edgar/data/2488/0000002488-18-000043.txt')

soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll('html'):
    print(item.get_text("\n", strip=True))

输出:

Document
UNITED STATES
SECURITIES AND EXCHANGE COMMISSION
Washington, D.C. 20549
____________________
FORM 8-K
CURRENT REPORT
Pursuant to Section 13 or 15(d) of the Securities Exchange Act of 1934        
February 27, 2018
Date of Report (Date of earliest event reported)
ADVANCED MICRO DEVICES, INC.
(Exact name of registrant as specified in its charter)
Delaware
001-07882
94-1692300
(State of Incorporation)
(Commission File Number)
(IRS Employer
Identification Number)
2485 Augustine Drive
Santa Clara, California  95054
(Address of principal executive offices)  (Zip Code)
(408) 749-4000
(Registrant’s telephone number, including area code)
N/A
(Former Name or Former Address, if Changed Since Last Report)
Check the appropriate box below if the Form 8-K filing is intended to simultaneously satisfy the filing obligation of the registrant under any of the following provisions:
¨
Written communications pursuant to Rule 425 under the Securities Act (17 CFR 230.425)
¨
Soliciting material pursuant to Rule 14a-12 under the Exchange Act (17 CFR 240.14a-12)
¨
Pre-commencement communications pursuant to Rule 14d-2(b) under the Exchange Act (17 CFR 240.14d-2(b))
¨
Pre-commencement communications pursuant to Rule 13e-4(c) under the Exchange Act (17 CFR 240.13e-4(c))
Indicate by check mark whether the registrant is an emerging growth company as defined in Rule 405 of the Securities Act of 1933 (§230.405 of this chapter) 
or Rule 12b-2 of the Securities Exchange Act of 1934 (§240.12b-2 of this chapter).  Emerging growth company
¨
If an emerging growth company, indicate by check mark if the registrant has elected not to use the extended transition period for complying with any new or 
revised financial accounting standards provided pursuant to Section 13(a) of the Exchange Act.
¨
Item 2.02. Results of Operation and Financial Condition.
Advanced Micro Devices, Inc. (the “Company”) is furnishing in Exhibit 99.1 consolidated statements of operations for 2016 and 2017, quarterly consolidated statements of operations for 2017, segment information for 2016 and 2017, quarterly segment information for 2017, consolidated balance sheets for 2016 and 2017, quarterly consolidated balance sheets for 2017, consolidated statements of cash flows - operating activities for 2016 and 2017, and quarterly consolidated statements of cash flows - operating activities for 2017, associated with the new accounting standard for revenue recognition, ASU No. 2014-09,
Revenue from Contracts with Customers: Topic 606
(“ASC 606”).
Item 7.01 Regulation FD Disclosure.
The Company adopted ASC 606 in the first quarter of 2018. The Company is furnishing Exhibit 99.1 as supplemental information regarding ASC 606.
To supplement the Company’s financial results presented on a U.S. Generally Accepted Accounting Principles (“GAAP”) basis, the Company’s Exhibit 99.1 contains non-GAAP financial measures, including non-GAAP gross margin, non-GAAP operating expenses, non-GAAP research and development and marketing, general and administrative expenses, non-GAAP operating income (loss), non-GAAP interest expense, non-GAAP other income (expense), non-GAAP provision (benefit) for income taxes, non-GAAP net income (loss), non-GAAP earnings (loss) per share and free cash flow. The Company believes that the supplemental non-GAAP financial measures assist investors in comparing the Company's core performance by excluding items that it believes are not indicative of the Company’s underlying operating performance. The Company cautions investors to carefully evaluate the financial results calculated in accordance with GAAP and the supplemental non-GAAP financial measures and reconciliations. The Company’s non-GAAP financial measures are not intended to be considered in isolation and are not a substitute 
for, or superior to, financial measures calculated in accordance with GAAP.   
The information in this report furnished pursuant to Items 2.02 and 7.01, including Exhibit 99.1 attached hereto, shall not be deemed “filed” for the purposes of Section 18 of the Securities Exchange Act of 1934, as amended (the “Exchange Act”), or otherwise subject to the liabilities of that section. It may only be incorporated by reference in another filing under the Exchange Act or the Securities Act of 1933, as amended (the "Securities Act"), if such subsequent filing specifically references the information furnished pursuant to Items 2.02 and 7.01 of this Current Report on Form 8-K.
Forward Looking Statements.
This Current Report on Form 8-K, including its exhibits, contains “forward-looking” statements within the meaning of Section 21E of the Exchange Act and Section 27A of the Securities Act. Forward-looking statements reflect current expectations and projections about future events, including AMD’s expectations regarding its financial outlook for fiscal 2018, AMD’s focus on growing revenue 
and increasing profitability in fiscal 2018, and AMD's expected timing of the 
completion of deliverables for a development and intellectual property licensing agreement and the ability of AMD to recognize revenue under such agreement 
at the expected time, and thus involve uncertainty and risk. It is possible that future events may differ from expectations due to a variety of risks and other factors such as those described in AMD’s Annual Report on Form 10-K for the fiscal year ended December 30, 2017, as filed with the U.S. Securities and Exchange Commission. It is not possible to foresee or identify all such factors. Any forward-looking statements in this Current Report on Form 8-K, including its exhibits, are based on certain assumptions and an analyses made in light 
of AMD’s experience and perception of historical trends, current conditions, expected future developments, and other factors it believes are appropriate in 
the circumstances. Forward-looking statements are not a guarantee of future performance and actual results or developments may differ materially from expectations. AMD does not intend to update any particular forward-looking statements contained in this Current Report on Form 8-K and its exhibits.
Item 9.01 Financial Statements and Exhibits.
(d) Exhibits.
EXHIBIT INDEX
Exhibit No.
Description
99.1
AMD Adoption of ASC 606 Revenue Recognition Accounting Standard - February 27, 2018
SIGNATURES
Pursuant to the requirements of the Securities Exchange Act of 1934, as amended, the registrant has duly caused this report to be signed on its behalf by the undersigned hereunto duly authorized.
Date: February 27, 2018
ADVANCED MICRO DEVICES, INC.
By:
/s/ Harry A. Wolin
Name:
Harry A. Wolin
Title:
Senior Vice President, General Counsel and
Corporate Secretary
Exhibit
AMD Adoption of ASC 606 Revenue Recognition Accounting Standard
February 27, 2018
Reconciliation for all non-GAAP financial measures discussed in this document 
to the most directly comparable GAAP financial measures is included below     
.
New Revenue Recognition Accounting Standard
AMD adopted the new revenue recognition accounting standard, ASC 606, effective Q1 2018.  ASC 606 is effective for all public companies for annual reporting periods beginning after December 15, 2017.
We adopted the new revenue recognition accounting standard under the “full retrospective” method, meaning that adjusted financials for 2016 and 2017 are being provided as though ASC 606 was effective in those prior periods.  This method of adoption makes it easier for investors as we provide 2018 guidance, actual results going forward and comparative prior results under one consistent standard.  There is no change to our underlying business guidance under the new 
standard and we remain focused on growing revenue and increasing profitability in 2018.  From Q1 2018 onwards all AMD financial results will be reported under the new revenue recognition accounting standard with prior period financial results adjusted for ASC 606 as provided in this document.
The new revenue accounting standard primarily impacts AMD revenue recognition 
for:
•
channel shipments on a sell-in basis (CPUs and GPUs),
•
inventory of custom products with a non-cancellable purchase order (semi-custom products), and
•
transactions that involve development and licensing agreements.
AMD Adoption of ASC 606 Revenue Recognition Standard
Page
1
February 27, 2018
Under the new standard, revenue from sales to distributors will be recognized 
as revenue upon the shipment of the product to the distributors (sell-in).    
•
Previously revenue recognition of sales to distributors was upon reported resale of the product by the distributors to their customers (sell-through).      
Semi-custom products under non-cancellable purchases orders will be recognized as revenue based on the value of the inventory and expected margin.
•
Previously semi-custom product revenue was recognized upon shipment.
Revenue associated with certain development and intellectual property licensing agreements will be recognized upon transfer of control of the intellectual property license.
•
Previously the fair value of these agreements was divided into an R&D credit for specific development work as the expenses were incurred and licensing revenue upon completion of the deliverables.
Revenue recognition related to all other revenue streams remains substantially unchanged under the new standard.
Summary of ASC 606 Impact to 2016 Financials
•
2016 GAAP and Non-GAAP Results:
•
2016
revenue
is  million higher driven by a net build in channel and semi-custom product inventory.
•
2016
gross margin
percentage does not change and gross margin
dollars increase by  million due to higher revenue.
•
There is no impact to
net loss per share
.
AMD Adoption of ASC 606 Revenue Recognition Standard
Page
2
February 27, 2018
Summary of ASC 606 Impact to 2017 Financials
•
2017 GAAP and non-GAAP Results:
•
2017
revenue
is  million lower driven by a net drain in channel and semi-custom product 
inventory.
•
Revenue in each of the quarters in 2017 is adjusted based on whether there is 
a net drain or net build of channel and semi-custom product inventory.        
•
2017
gross margin
percentage does not change and gross margin
dollars decrease by  million due primarily to lower channel revenue.       
•
Gross margin dollars for each quarter in 2017 are adjusted based on higher or 
lower channel and semi-custom product revenue
.
•
Operating expenses
(OPEX) for 2017 are higher by  million primarily due to the absence of  
million of R&D credits related to a development and intellectual property licensing agreement signed in 2017.  It is expected that the deliverables for this agreement will be completed in 2018 and revenue will be recognized upon transfer of the license.  Marketing, general and administrative expenses increase slightly due to a shift in the timing of recognition of marketing fund expenses.
•
OPEX for each quarter in 2017
increases primarily due to the absence of R&D credits.
•
Provision (benefit) for income taxes
adjustment for 2017 relates to the reduction of withholding tax expense associated with the absence of R&D credits.
•
Q3 and Q4 2017 taxes were also impacted by ASC 606 adjustments.
•
Earnings (loss) per share
for 2017 is lower by [=11=].07 due to the impact of lower gross margin dollars of 
approximately $(0.035) as a result of lower revenue and the impact of the absence of  million of R&D credits of approximately $(0.035).
AMD Adoption of ASC 606 Revenue Recognition Standard
Page
3
February 27, 2018
•
Earnings (loss) per share for each quarter in 2017 is adjusted based primarily on changes to operating income (loss).
Summary of the impact of ASC 606 on Reportable Segments in 2016 and 2017      
Computing and Graphics:
•
Revenue
increases in 2016 by  million due to a net build in channel inventory.  Revenue is lower in 2017 by  million due to a net drain in channel inventory. 
•
Revenue in each of the quarters for 2017 is adjusted based on whether there is a net increase or decrease in channel revenue
.
•
Operating Income (Loss)
decreases  million in 2016 primarily due to slightly higher operating expenses.  Operating income (loss) decreases  million in 2017 primarily due to lower revenue and the absence of R&D credits.
•
Operating income (loss) in each of the quarters for 2017 is adjusted based on 
the impact of revenue and operating expenses and by the absence of R&D credits
.
Enterprise, Embedded and Semi-Custom:
•
Revenue
increases  million in 2016 due primarily to an increase in semi-custom product revenue and decreases  million in 2017 due primarily to a decrease in semi-custom product revenue.
•
Revenue in each of the quarters for 2017 is adjusted based on whether there is a net increase or decrease in semi-custom product revenue
.
•
Operating Income (Loss)
increases  million in 2016 and decreases  million in 2017 primarily due to the impact of semi-custom product revenue recognition.  In addition, 2017 is impacted by the absence of R&D credits.
AMD Adoption of ASC 606 Revenue Recognition Standard
Page
4
February 27, 2018
•
Operating income (loss) in each of the quarters for 2017 is adjusted based on 
the impact of revenue and the absence of R&D credits
.
Summary of the key impact on Balance Sheet items under ASC 606 for Annual 2016 & Annual and Quarterly 2017
•
Accounts receivable
increases in all periods primarily due to the acceleration in timing of semi-custom product revenue
.
•
Inventory
decreases in all periods primarily due to the acceleration in timing of semi-custom product revenue
.
•
Other current liabilities
increases throughout 2017 due to the reclassification of R&D credits to the balance sheet as deferred revenue.  There is no change to 2016.
•
Deferred income on shipments to distributors
line item, which represented the deferral of income for shipments to distributors previously recognized as revenue upon reported sale by our distributors (sell
-
through), goes away under ASC 606 as channel revenue is recognized upon shipment (sell-in) under ASC 606.
2016 and 2017 Cash Flow Statements
There is no impact on cash flow during any period from the adoption of ASC 606.
In summary the ASC 606 adjusted 2016 and 2017 financial results, provided in this document, reflect the effects of this new revenue recognition accounting standard. There is no change to our underlying business guidance under the new 
standard and we remain focused on growing revenue and increasing profitability in 2018.
Investor Contacts:
Ruth Cotter
Laura Graves
Alina Ostrovsky
408-749-3887
408-749-5467
408-749-6688
ruth.cotter@amd.com
laura.graves@amd.com
alina.ostrovsky@amd.com
AMD Adoption of ASC 606 Revenue Recognition Standard
Page
5
February 27, 2018