为什么这个 Beautiful Soup 代码没有获得目标数据?
Why isn't this Beautiful Soup code getting the targeted data?
我正在尝试使用 Beautiful Soup 抓取 EDGAR 上 10K SEC 文件的属性部分中的文本。
我可以获得 Properties 部分 header 好吧,然后继续 parent 节点,但是从那里 next_sibling 方法没有识别下一个兄弟节点(在这种情况下我相信包含该节中的第一段文字)。谁能告诉我为什么这不起作用/如何解决?
代码:
import requests
from bs4 import BeautifulSoup
url = 'https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-10k_20191231.htm'
soup = BeautifulSoup(requests.get(url).content, 'lxml')
properties_header = soup.find_all('p', text="PROPERTIES")[0]
print(properties_header.parent.parent.parent.parent.next_sibling)
预期结果:
<p style="margin-top:4pt;margin-bottom:0pt;text-indent:5.24%;font-family:Times New Roman;font-size:10pt;font-weight:normal;font-style:normal;text-transform:none;font-variant: normal;">We are headquartered in Palo Alto, California. Our principal facilities include a large number of properties in North America, Europe and Asia utilized for manufacturing and assembly, warehousing, engineering, retail and service locations, Supercharger sites, and administrative and sales offices. Our facilities are used to support both of our reporting segments, and are suitable and adequate for the conduct of our business. We primarily lease such facilities with the exception of some manufacturing facilities. The following table sets forth the location of our primary owned and leased manufacturing facilities.</p>
第一个 next_sibling 是一个 NavigableString。 Double-up 在 next_sibling 上到达以下 p.
print(properties_header.parent.parent.parent.parent.next_sibling.next_sibling)
我正在尝试使用 Beautiful Soup 抓取 EDGAR 上 10K SEC 文件的属性部分中的文本。
我可以获得 Properties 部分 header 好吧,然后继续 parent 节点,但是从那里 next_sibling 方法没有识别下一个兄弟节点(在这种情况下我相信包含该节中的第一段文字)。谁能告诉我为什么这不起作用/如何解决?
代码:
import requests
from bs4 import BeautifulSoup
url = 'https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-10k_20191231.htm'
soup = BeautifulSoup(requests.get(url).content, 'lxml')
properties_header = soup.find_all('p', text="PROPERTIES")[0]
print(properties_header.parent.parent.parent.parent.next_sibling)
预期结果:
<p style="margin-top:4pt;margin-bottom:0pt;text-indent:5.24%;font-family:Times New Roman;font-size:10pt;font-weight:normal;font-style:normal;text-transform:none;font-variant: normal;">We are headquartered in Palo Alto, California. Our principal facilities include a large number of properties in North America, Europe and Asia utilized for manufacturing and assembly, warehousing, engineering, retail and service locations, Supercharger sites, and administrative and sales offices. Our facilities are used to support both of our reporting segments, and are suitable and adequate for the conduct of our business. We primarily lease such facilities with the exception of some manufacturing facilities. The following table sets forth the location of our primary owned and leased manufacturing facilities.</p>
第一个 next_sibling 是一个 NavigableString。 Double-up 在 next_sibling 上到达以下 p.
print(properties_header.parent.parent.parent.parent.next_sibling.next_sibling)