python如何计算html中开始和结束标签的数量

python how to count the number of the opening and closing tags in html

如何计算html

中开始和结束标签的数量

ya.html

<div class="side-article txt-article">
<p>
    <strong>
    </strong> 
    <a href="http://batam.tribunnews.com/tag/polres/" title="Polres">
    </a> 
    <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan">
    </a>
</p>
<p>
    <br>
</p>
<p>
    <a href="http://batam.tribunnews.com/tag/polres/" title="Polres">
    </a>
</p>
<p>
    <a href="http://batam.tribunnews.com/tag/polres/" title="Polres">
    </a> 
    <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan">
    </a>
</p>
<br>

我的代码

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('ya.html'), "html.parser")
num_apperances_of_tag = len(soup.find_all())

print num_apperances_of_tag

输出

13

但这不是我想要的,因为我的代码将 <p> </p> 计为一个,而我想分别计算开始和结束标记。

如何计算html中开始和结束标签的数量? 所以输出将是

23 

谢谢

我建议你使用 html 解析器来解决这个问题:

from HTMLParser import HTMLParser

number_of_starttags = 0
number_of_endtags = 0

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        global number_of_starttags
        number_of_starttags += 1

    def handle_endtag(self, tag):
        global number_of_endtags
        number_of_endtags += 1

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head><body><h1>Parse me!</h1></body></html>')

print(number_of_starttags, number_of_endtags)