我的 for 循环不断重复正在打印的项目

My for loop keeps repeating the items being printed

我正在抓取一个网页并试图打印出页面上的所有要点,但我的 for 循环一直以一种奇怪的方式重复它,我不太明白为什么。 这是我周二收到的确切回复。

Tuesday
 In Class Today:

Read Chapter 4
Annotations 
Book Study




Tuesday
 Read Chapter 4


Tuesday
 Annotations 


Tuesday
 Book Study


Tuesday
 Due Today:


Tuesday
 Homework for Next Class:

Study Stems
Annotations and Book Study 1-4 due BOC Wed




Tuesday
 Study Stems


Tuesday
 Annotations and Book Study 1-4 due BOC Wed

这是 html 的部分(我不能给出页面本身,因为它隐藏在登录后面)

<p data-uw-styling-context="true"><img src="https://fisd.instructure.com/courses/56950/files/4791824/download" alt="tear drop line 3.png" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/56950/files/4791824" data-api-returntype="File" style="max-width: 676px;" data-uw-styling-context="true"></p>
<h2 data-uw-styling-context="true"> Monday</h2>
<ul data-uw-styling-context="true">
<li style="list-style-type: none;" data-uw-styling-context="true">
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">No School</li>
</ul>
</li>
</ul>
<hr data-uw-styling-context="true">
<h2 data-uw-styling-context="true"> Tuesday</h2>
<ul data-uw-styling-context="true">
<li style="list-style-type: none;" data-uw-styling-context="true">
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">In Class Today:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Read Chapter 4</li>
<li data-uw-styling-context="true">Annotations&nbsp;</li>
<li data-uw-styling-context="true">Book Study</li>
</ul>
</li>
<li data-uw-styling-context="true">Due Today:</li>
<li data-uw-styling-context="true">Homework for Next Class:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Study Stems</li>
<li data-uw-styling-context="true">Annotations and Book Study 1-4 due BOC Wed</li>
</ul>
</li>
</ul>
</li>
</ul>
<hr data-uw-styling-context="true">
<h2 data-uw-styling-context="true"> Wednesday</h2>
<ul data-uw-styling-context="true">
<li style="list-style-type: none;" data-uw-styling-context="true">
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">In Class Today:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Subject Complement Notes&nbsp;</li>
<li data-uw-styling-context="true">"There Will Come Soft Rains"&nbsp;</li>
</ul>
</li>
<li data-uw-styling-context="true">Due Today:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Annotations and Book Study Ch. 1-4</li>
</ul>
</li>
<li data-uw-styling-context="true">Homework for Next Class:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Study Stems&nbsp;</li>
</ul>
</li>
</ul>
</li>
</ul>
<hr data-uw-styling-context="true">
<h2 data-uw-styling-context="true"> Thursday</h2>
<ul data-uw-styling-context="true">
<li style="list-style-type: none;" data-uw-styling-context="true">
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">In Class Today:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Subject Complement Practice</li>
<li data-uw-styling-context="true">TWCSR</li>
</ul>
</li>
<li data-uw-styling-context="true">Due Today:</li>
<li data-uw-styling-context="true">Homework for Next Class:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Study Stems&nbsp;</li>
</ul>
</li>
</ul>
</li>
</ul>
<hr data-uw-styling-context="true">
<h2 data-uw-styling-context="true"> Friday</h2>
<ul data-uw-styling-context="true">
<li style="list-style-type: none;" data-uw-styling-context="true">
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">In Class Today:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Stems Quiz 5 Major Grade</li>
<li data-uw-styling-context="true">TWCSR (Due Monday BOC)</li>
</ul>
</li>
<li data-uw-styling-context="true">Due Today:</li>
<li data-uw-styling-context="true">Homework for Next Class:</li>
</ul>
</li>
</ul>

这是我的 for 循环,

page = open("page.html", 'r')
soup = BeautifulSoup(page, "lxml")
for x in days:
    for day in soup.select('h2:-soup-contains('+x+')'):
        for item in day.find_next('li').find_all('li'):
            print(x+'\n', item.text+'\n\n')

如果有人能帮我解决这个问题那就太好了。

注意: 因为问题缺少预期的输出,所以这只是一个例子

会发生什么?

星期二被一遍又一遍地打印出来,因为你把它放在了几个任务的第二个循环中。

如何修复?

要只打印它,请将它放在循环中几天:

for day in ['Monday','Tuesday','Wednesday','Thursday','Friday']:
    print(day)
    for item in soup.select(f'h2:-soup-contains("{day}") + ul li li'):
        print(item.text)
    print('\n')

例子

html = '''<div class="show-content user_content clearfix enhanced" data-uw-styling-context="true"> <h1 class="page-title" data-uw-styling-context="true">Unit 3 I Week 3</h1>   <div style="background-color: #184366; color: white; padding: 15px;" data-uw-styling-context="true"> <h2 data-uw-styling-context="true"><span style="font-size: 30pt;" data-uw-styling-context="true">Unit 3 | Week 3: January 18th-21st</span></h2> </div> <h2 data-uw-styling-context="true">Essential Questions</h2> <ul data-uw-styling-context="true"> <li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">How does voice relate to the audience and purpose?</span></li> <li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">What techniques does the author use to get his/her point across and communicate?</span></li> <li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">How can technology be beneficial and/or detrimental to society?</span></li> </ul> <h2 data-uw-styling-context="true">Objectives</h2> <ul data-uw-styling-context="true"> <li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Analyze the concept of utopia/dystopia as presented in the novel</span></li> <li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Create a utopia to represent the ideas of the group and backed up with research</span></li> <li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Analyze expository/informational text&nbsp;</span></li> <li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Understand rhetorical devices and logical fallacies</span></li> <li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Interpret elements of media including television and digital graphics</span></li> <li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Create a TV newscast that organizes and presents research with certain purposes and audiences in mind</span></li> <li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Collaborate to create a professional product</span></li> <li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Explain author’s purpose and message within a text</span></li> </ul> <p data-uw-styling-context="true"><img src="https://fisd.instructure.com/courses/56950/files/4791824/download" alt="tear drop line 3.png" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/56950/files/4791824" data-api-returntype="File" style="max-width: 676px;" data-uw-styling-context="true"></p> <h2 data-uw-styling-context="true"> Monday</h2> <ul data-uw-styling-context="true"> <li style="list-style-type: none;" data-uw-styling-context="true"> <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">No School</li> </ul> </li> </ul> <hr data-uw-styling-context="true"> <h2 data-uw-styling-context="true"> Tuesday</h2> <ul data-uw-styling-context="true"> <li style="list-style-type: none;" data-uw-styling-context="true"> <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">In Class Today: <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">Read Chapter 4</li> <li data-uw-styling-context="true">Annotations&nbsp;</li> <li data-uw-styling-context="true">Book Study</li> </ul> </li> <li data-uw-styling-context="true">Due Today:</li> <li data-uw-styling-context="true">Homework for Next Class: <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">Study Stems</li> <li data-uw-styling-context="true">Annotations and Book Study 1-4 due BOC Wed</li> </ul> </li> </ul> </li> </ul> <hr data-uw-styling-context="true"> <h2 data-uw-styling-context="true"> Wednesday</h2> <ul data-uw-styling-context="true"> <li style="list-style-type: none;" data-uw-styling-context="true"> <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">In Class Today: <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">Subject Complement Notes&nbsp;</li> <li data-uw-styling-context="true">"There Will Come Soft Rains"&nbsp;</li> </ul> </li> <li data-uw-styling-context="true">Due Today: <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">Annotations and Book Study Ch. 1-4</li> </ul> </li> <li data-uw-styling-context="true">Homework for Next Class: <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">Study Stems&nbsp;</li> </ul> </li> </ul> </li> </ul> <hr data-uw-styling-context="true"> <h2 data-uw-styling-context="true"> Thursday</h2> <ul data-uw-styling-context="true"> <li style="list-style-type: none;" data-uw-styling-context="true"> <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">In Class Today: <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">Subject Complement Practice</li> <li data-uw-styling-context="true">TWCSR</li> </ul> </li> <li data-uw-styling-context="true">Due Today:</li> <li data-uw-styling-context="true">Homework for Next Class: <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">Study Stems&nbsp;</li> </ul> </li> </ul> </li> </ul> <hr data-uw-styling-context="true"> <h2 data-uw-styling-context="true"> Friday</h2> <ul data-uw-styling-context="true"> <li style="list-style-type: none;" data-uw-styling-context="true"> <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">In Class Today: <ul data-uw-styling-context="true"> <li data-uw-styling-context="true">Stems Quiz 5 Major Grade</li> <li data-uw-styling-context="true">TWCSR (Due Monday BOC)</li> </ul> </li> <li data-uw-styling-context="true">Due Today:</li> <li data-uw-styling-context="true">Homework for Next Class:</li> </ul> </li> </ul> <p data-uw-styling-context="true"><img src="https://fisd.instructure.com/courses/56950/files/4791824/download" alt="tear drop line 3.png" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/56950/files/4791824" data-api-returntype="File" style="max-width: 676px;" data-uw-styling-context="true"></p> <p data-uw-styling-context="true"><img style="float: left; max-width: 72px;" src="https://fisd.instructure.com/courses/56950/files/4791827/download" alt="Left Arrow (1).png" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/56950/files/4791827" data-api-returntype="File" data-uw-styling-context="true"></p> <p data-uw-styling-context="true"><br data-uw-styling-context="true">&nbsp;<a title="Unit 3 Overview" href="https://fisd.instructure.com/courses/111538/pages/unit-3-overview" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/111538/pages/unit-3-overview" data-api-returntype="Page" data-uw-styling-context="true">Unit 3 Homepage</a></p> <p data-uw-styling-context="true">&nbsp;</p> <p data-uw-styling-context="true"><a title="Home" href="https://fisd.instructure.com/courses/111538/pages/home" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/111538/pages/home" data-api-returntype="Page" data-uw-styling-context="true"><img style="float: left; max-width: 72px;" src="https://fisd.instructure.com/courses/56950/files/4791834/download?wrap=1" alt="Home Black.png" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/56950/files/4791834" data-api-returntype="File" data-uw-styling-context="true"> <br data-uw-styling-context="true">Course Homepage</a></p> <p data-uw-styling-context="true">&nbsp;</p>  </div> '''
soup=BeautifulSoup(html,'lxml')

for day in ['Monday','Tuesday','Wednesday','Thursday','Friday']:
    print(day)
    for item in soup.select(f'h2:-soup-contains("{day}") + ul li li'):
        print(item.text)
    print('\n')

输出

Monday
No School


Tuesday
In Class Today:  Read Chapter 4 Annotations  Book Study  
Read Chapter 4
Annotations 
Book Study
Due Today:
Homework for Next Class:  Study Stems Annotations and Book Study 1-4 due BOC Wed  
Study Stems
Annotations and Book Study 1-4 due BOC Wed


Wednesday
In Class Today:  Subject Complement Notes  "There Will Come Soft Rains"   
Subject Complement Notes 
"There Will Come Soft Rains" 
Due Today:  Annotations and Book Study Ch. 1-4  
Annotations and Book Study Ch. 1-4
Homework for Next Class:  Study Stems   
Study Stems 


Thursday
In Class Today:  Subject Complement Practice TWCSR  
Subject Complement Practice
TWCSR
Due Today:
Homework for Next Class:  Study Stems   
Study Stems 


Friday
In Class Today:  Stems Quiz 5 Major Grade TWCSR (Due Monday BOC)  
Stems Quiz 5 Major Grade
TWCSR (Due Monday BOC)
Due Today:
Homework for Next Class:

我按照您不想打印重复条目的方式解释了您的问题。原因是外部项目符号点还包括子标签。为了避免它们,您只打印每个标签的第一项:

page = open("page.html", 'r')
soup = BeautifulSoup(page, "lxml")
for x in days:
    for day in soup.select('h2:-soup-contains('+x+')'):
        for item in day.find_next('li').find_all('li'):

            bullet_point = next(item.stripped_strings)
            print(bullet_point)

输出:

In Class Today:
Read Chapter 4
Annotations
Book Study
Due Today:
Homework for Next Class:
Study Stems
Annotations and Book Study 1-4 due BOC Wed