Python 中循环的奇怪行为
Odd Behaviour of Loop in Python
我正在编写一个脚本来报告来自 Markdown 文本文件的统计信息。该文件包含书名和日期。每个日期都属于后面的标题,直到出现新日期。这是一个示例:
#### 8/23/05
Defining the World (Hitchings)
#### 8/26/05
Lost Japan
#### 9/5/05
The Kite Runner
*The Dark Valley (Brendon)*
#### 9/9/05
Active Liberty
我用 for
循环遍历文件中的行并检查每一行以查看它是否是日期。如果是日期,我设置一个变量this_date
。如果它是一个标题,我把它变成一个字典,当前值为 this_date
.
有两个例外:文件以标题开头,而不是日期,所以我在 for 循环之前为 this_date
设置了一个初始值。在文件的中途有一个区域丢失了日期,我为这些标题设置了一个特定的日期。
但是在生成的字典列表中,所有标题都给出了那个日期,直到 lost-data 区域开始。在那之后,其余标题的日期为文件中最后出现的日期。最令人困惑的是:当我在附加新字典之前打印 this_date
的内容时,它在每个循环中都包含正确的值。
我希望 this_date
在循环的所有级别都可见。我知道我需要将它分解成函数,并且在函数之间显式传递结果可能会解决这个问题,但我想知道为什么这种方法不起作用。非常感谢。
result = []
# regex patterns
ddp = re.compile('\d+') # extract digits
mp = re.compile('^#+\s*\d+') # captures hashes and spaces
dp = re.compile('/\d+/') # captures slashes
yp = re.compile('\d+$')
sp = re.compile('^\*')
# initialize
this_date = {
'month': 4,
'day': 30,
'year': 2005
}
# print('this_date initialized')
for line in text:
if line == '':
pass
else:
if '#' in line: # markdown header format - line is a new date
if 'Reconstructing lost data' in line: # handle exception
# titles after this line are given 12/31/14 (the last date in the file) instead of 8/31/10
# all prior dates are overwritten with 8/31/10
# but the intent is that titles after this line appears have date 8/31/10, until the next date
this_date = {
'month': 8,
'day': 31,
'year': 2010
}
# print('set this_date to handle exception')
else: # get the date from the header
month = ddp.search( mp.search(line).group() ) # digits only
day = ddp.search( dp.search(line).group() ) # digits only
year = yp.search(line)
if month and day and year:
# print('setting this_date within header parse')
this_date['month'] = int(month.group())
this_date['day'] = int(day.group())
this_date['year'] = ( int(year.group()) + 2000 )
else:
pass
else: # line is a title
x = {
'date': this_date,
'read': False
}
if sp.match(line): # starts with asterisk - has been read
x['read'] = True
x['title'] = line[1:-3] # trim trailing asterisk and spaces
else:
x['title'] = line
# this_date is correct when printed here
# print('this_date is ' + str(this_date['month']) + '/' + str(this_date['day']) + '/' + str(this_date['year']) )
result.append(x)
# x has correct date when printed here
# print(x)
# print("Done; found %d titles.") % len(result)
# elements of result have wrong dates (either 8/31/10 or 12/31/14, no other values) when printed here
# print( result[0::20])
您只创建了 this_date
字典 一次。然后,您 重复使用 字典每次循环迭代。您只是将 references 添加到您的 result
列表中;它只是被反复引用的一个词典。
存储一个 new 字典的副本每次循环迭代:
x = {
'date': this_date.copy(),
'read': False
}
您的代码可以做一些简化;我会在这里使用 datetime.date()
objects,因为它们正确地模拟了日期。不需要正则表达式:
from datetime import datetime
current_date = None
results = []
for line in text:
line = line.strip()
if not line:
continue
if line.startswith('#'):
current_date = datetime.strptime(line.strip('# '), '%m/%d/%y').date()
continue
entry = {'date': current_date, 'read': False}
if line.startswith('*') and line.endswith('*'):
# previously read
line = line.strip('*')
entry['read'] = True
entry['title'] = line
results.append(entry)
因为 datetime.date()
object 是不可变的,我们每次遇到 header 行时都会创建一个新的 date
object,您可以安全地 re-use last-read 日期。
演示:
>>> from datetime import datetime
>>> from pprint import pprint
>>> text = '''\
... #### 8/23/05
... Defining the World (Hitchings)
... #### 8/26/05
... Lost Japan
... #### 9/5/05
... The Kite Runner
... *The Dark Valley (Brendon)*
... #### 9/9/05
... Active Liberty
... '''.splitlines(True)
>>> current_date = None
>>> results = []
>>> for line in text:
... line = line.strip()
... if not line:
... continue
... if line.startswith('#'):
... current_date = datetime.strptime(line.strip('# '), '%m/%d/%y').date()
... continue
... entry = {'date': current_date, 'read': False}
... if line.startswith('*') and line.endswith('*'):
... # previously read
... line = line.strip('*')
... entry['read'] = True
... entry['title'] = line
... results.append(entry)
...
>>> pprint(results)
[{'date': datetime.date(2005, 8, 23),
'read': False,
'title': 'Defining the World (Hitchings)'},
{'date': datetime.date(2005, 8, 26), 'read': False, 'title': 'Lost Japan'},
{'date': datetime.date(2005, 9, 5),
'read': False,
'title': 'The Kite Runner'},
{'date': datetime.date(2005, 9, 5),
'read': True,
'title': 'The Dark Valley (Brendon)'},
{'date': datetime.date(2005, 9, 9), 'read': False, 'title': 'Active Liberty'}]
我正在编写一个脚本来报告来自 Markdown 文本文件的统计信息。该文件包含书名和日期。每个日期都属于后面的标题,直到出现新日期。这是一个示例:
#### 8/23/05
Defining the World (Hitchings)
#### 8/26/05
Lost Japan
#### 9/5/05
The Kite Runner
*The Dark Valley (Brendon)*
#### 9/9/05
Active Liberty
我用 for
循环遍历文件中的行并检查每一行以查看它是否是日期。如果是日期,我设置一个变量this_date
。如果它是一个标题,我把它变成一个字典,当前值为 this_date
.
有两个例外:文件以标题开头,而不是日期,所以我在 for 循环之前为 this_date
设置了一个初始值。在文件的中途有一个区域丢失了日期,我为这些标题设置了一个特定的日期。
但是在生成的字典列表中,所有标题都给出了那个日期,直到 lost-data 区域开始。在那之后,其余标题的日期为文件中最后出现的日期。最令人困惑的是:当我在附加新字典之前打印 this_date
的内容时,它在每个循环中都包含正确的值。
我希望 this_date
在循环的所有级别都可见。我知道我需要将它分解成函数,并且在函数之间显式传递结果可能会解决这个问题,但我想知道为什么这种方法不起作用。非常感谢。
result = []
# regex patterns
ddp = re.compile('\d+') # extract digits
mp = re.compile('^#+\s*\d+') # captures hashes and spaces
dp = re.compile('/\d+/') # captures slashes
yp = re.compile('\d+$')
sp = re.compile('^\*')
# initialize
this_date = {
'month': 4,
'day': 30,
'year': 2005
}
# print('this_date initialized')
for line in text:
if line == '':
pass
else:
if '#' in line: # markdown header format - line is a new date
if 'Reconstructing lost data' in line: # handle exception
# titles after this line are given 12/31/14 (the last date in the file) instead of 8/31/10
# all prior dates are overwritten with 8/31/10
# but the intent is that titles after this line appears have date 8/31/10, until the next date
this_date = {
'month': 8,
'day': 31,
'year': 2010
}
# print('set this_date to handle exception')
else: # get the date from the header
month = ddp.search( mp.search(line).group() ) # digits only
day = ddp.search( dp.search(line).group() ) # digits only
year = yp.search(line)
if month and day and year:
# print('setting this_date within header parse')
this_date['month'] = int(month.group())
this_date['day'] = int(day.group())
this_date['year'] = ( int(year.group()) + 2000 )
else:
pass
else: # line is a title
x = {
'date': this_date,
'read': False
}
if sp.match(line): # starts with asterisk - has been read
x['read'] = True
x['title'] = line[1:-3] # trim trailing asterisk and spaces
else:
x['title'] = line
# this_date is correct when printed here
# print('this_date is ' + str(this_date['month']) + '/' + str(this_date['day']) + '/' + str(this_date['year']) )
result.append(x)
# x has correct date when printed here
# print(x)
# print("Done; found %d titles.") % len(result)
# elements of result have wrong dates (either 8/31/10 or 12/31/14, no other values) when printed here
# print( result[0::20])
您只创建了 this_date
字典 一次。然后,您 重复使用 字典每次循环迭代。您只是将 references 添加到您的 result
列表中;它只是被反复引用的一个词典。
存储一个 new 字典的副本每次循环迭代:
x = {
'date': this_date.copy(),
'read': False
}
您的代码可以做一些简化;我会在这里使用 datetime.date()
objects,因为它们正确地模拟了日期。不需要正则表达式:
from datetime import datetime
current_date = None
results = []
for line in text:
line = line.strip()
if not line:
continue
if line.startswith('#'):
current_date = datetime.strptime(line.strip('# '), '%m/%d/%y').date()
continue
entry = {'date': current_date, 'read': False}
if line.startswith('*') and line.endswith('*'):
# previously read
line = line.strip('*')
entry['read'] = True
entry['title'] = line
results.append(entry)
因为 datetime.date()
object 是不可变的,我们每次遇到 header 行时都会创建一个新的 date
object,您可以安全地 re-use last-read 日期。
演示:
>>> from datetime import datetime
>>> from pprint import pprint
>>> text = '''\
... #### 8/23/05
... Defining the World (Hitchings)
... #### 8/26/05
... Lost Japan
... #### 9/5/05
... The Kite Runner
... *The Dark Valley (Brendon)*
... #### 9/9/05
... Active Liberty
... '''.splitlines(True)
>>> current_date = None
>>> results = []
>>> for line in text:
... line = line.strip()
... if not line:
... continue
... if line.startswith('#'):
... current_date = datetime.strptime(line.strip('# '), '%m/%d/%y').date()
... continue
... entry = {'date': current_date, 'read': False}
... if line.startswith('*') and line.endswith('*'):
... # previously read
... line = line.strip('*')
... entry['read'] = True
... entry['title'] = line
... results.append(entry)
...
>>> pprint(results)
[{'date': datetime.date(2005, 8, 23),
'read': False,
'title': 'Defining the World (Hitchings)'},
{'date': datetime.date(2005, 8, 26), 'read': False, 'title': 'Lost Japan'},
{'date': datetime.date(2005, 9, 5),
'read': False,
'title': 'The Kite Runner'},
{'date': datetime.date(2005, 9, 5),
'read': True,
'title': 'The Dark Valley (Brendon)'},
{'date': datetime.date(2005, 9, 9), 'read': False, 'title': 'Active Liberty'}]