为什么得到的htmlcontent.txt是空的？

Question

程序的目标是简单获取tageschau.de的标题。一开始还正常，跑了几次就什么都没有了

import requests
from bs4 import BeautifulSoup

headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
                          'AppleWebKit/537.36 (KHTML, like Gecko)'
                          'Chrome/86.0.4240.111 Safari/537.36',
            'Host': 'www.tagesschau.de',
            'Referer': 'https://www.tagesschau.de/'
          }

# get and parse the HTML of tageschau.de
URL = 'https://www.tagesschau.de/'
html = requests.get(URL, headers=headers)
html_parse = BeautifulSoup(html.content, 'lxml')

# find all headline in homepage
elements = html_parse.find_all('h4',{'class':'headline'})
for element in elements:
    print(element.txt)

一无所获。

None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None

但是当我使用element而不是element.txt时，有一些正确的输出

<h4 class="headline"><a href="/multimedia/livestreams/livestream3/">Live: tagesschau24</a></h4>
<h4 class="headline"><a href="/100sekunden/">100 Sekunden</a></h4>
<h4 class="headline"><a href="/multimedia/sendung/ts-39833.html">tagesschau 20 Uhr</a></h4>
<h4 class="headline"><a href="/multimedia/sendung/ts-39841.html">Letzte Sendung</a></h4>
<h4 class="headline">++ Fauci warnt vor "einer Menge Leid" ++</h4>
<h4 class="headline">Weniger Party, mehr Wellness</h4>
<h4 class="headline">November-Lockdown kostet 19 Milliarden</h4>

这让我很困惑，为什么？

Answer 1

如果你想获取元素的内部文本尝试.text:

for element in elements:
    print(element.text)

对于 innerHTML 使用 .html:

for element in elements:
    print(element.html)

为什么得到的htmlcontent.txt是空的？

why get the html content.txt is empty?

python

web-scraping

scrape