在列表中查找连续的月份
Finding consecutive months in a list
我有兴趣在无序列表中找到最大的月集,它可以作为 不同的、连续的 月的有序列表返回。
例如:
consecutive_months(["December", "January", "February", "April"])
会输出:
"December", "January", "February"
并且:
consecutive_months(["February", "December", "January"])
会输出:
"December", "January", "February"
下面的方法有效,但我很好奇是否有人对更优雅的方法有想法:
MONTHS = ["January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"]
def consecutive_months(lst_of_months):
# create two years of months to handle going from Dec to Jan
results = []
for m in set(lst_of_months):
results.append((m,MONTHS.index(m)))
results.append((m,MONTHS.index(m)+12))
results = sorted(results, key=lambda x: x[1])
# find the longest series of consecutive months
this_series = []
longest_series = []
for this_month in results:
if len(this_series) > 0:
last_month = this_series[-1]
if last_month[1] + 1 == this_month[1]:
this_series.append(this_month)
else:
this_series = [this_month]
else:
this_series = [this_month]
if len(this_series) > len(longest_series):
longest_series = [m for (m,i) in this_series]
return longest_series
Here 是一个带有示例输入和预期输出的 pastebin。
我注意到您的代码存在一个问题:当所有 12 个月都出现在输入中时,输出会列出所有月份两次。这很容易解决,只需执行以下操作:
return longest_series[:12]
我会寻求一种解决方案,将输入转换为一种 12 位的“位图”,其中 1 表示相应的月份在输入中,而 0 表示不是。
如果表示为12个字符的字符串,您可以使用正则表达式轻松识别“1”的序列。
我还会对月份列表进行一些预处理,这样你们就有了列表和它的字典版本,并且列表加倍了,这样你就可以从它切入 12 边界。
这是建议的代码:
import re
months = ["January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"]
# Also create a dictionary to get a month's index
month_nums = { month: num for num, month in enumerate(months) }
# ... and double the months list, to ease slicing across the 12-boundary
months += months
def consecutive_months(given_months):
# Deal with boundary case
if not given_months:
return []
# Convert input to 12 bits in string format
lst = ["0"] * 12
for m in given_months:
lst[month_nums[m]] = "1"
bits = "".join(lst)
# Identify the longest chunk of consecutive "1" in that doubled string
_, start, end = max((j-i, i, j)
for i, j in (match.span(0)
for match in re.finditer("1+", bits + bits)
)
)
# Using the found span, extract the corresponding month names
return months[start:end][:12]
月份字符串只是一个符号,其实质还是后面对应的数字,从1到12,一个月又一个月。
两个月的字符串不能直接比较。换算成数字的话,下个月的数字可以加1(12月之后的1月除外),
而且数字之间的比较肯定大于字符串。
我优化后的代码如下:
MONTHS = ["January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"]
month_num_dict = {month: num for num, month in enumerate(MONTHS, start=1)}
def consecutive_months(month_list: list) -> list:
# Deal with boundary case
if len(month_list) == 0:
return month_list
# A list of twice length is required only when the first and last months end to end
first_month_num = month_num_dict[month_list[0]]
last_month_num = month_num_dict[month_list[-1]]
last_month_next_num = last_month_num + 1 if last_month_num != 12 else 1
month_list = month_list * 2 if last_month_next_num == first_month_num else month_list
# Initialize list of candidates and longest series
candidate = [month_list[0], ]
longest_series = [month_list[0], ]
for i in range(len(month_list) - 1):
month = month_list[i]
month_num = month_num_dict[month]
next_month = month_list[i + 1]
next_month_num = month_num_dict[next_month]
expected_next_month_num = month_num + 1 if month_num != 12 else 1
if expected_next_month_num == next_month_num:
candidate.append(next_month)
# At the end of the traversal, decide whether to update the longest series
# according to the length of the candidate.
if i == len(month_list) - 2 and len(candidate) > len(longest_series):
longest_series = candidate
else:
# When the length of the new candidate is greater than the old, update the longest series
if len(candidate) > len(longest_series):
longest_series = candidate
# Generate next candidate month list
candidate = [next_month, ]
# Deal with all 12 months input list
if len(longest_series) > 12:
return MONTHS
return longest_series
如果担心手写的MONTHS
列表有误,也可以通过time.strftime
获取:
import time
import locale
locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
month_num_dict = {
time.strftime("%B", time.strptime(str(num), "%m")): num
for num in range(1, 13)
}
MONTHS = list(month_num_dict.keys())
当然,为了设置回原来的locale,保证线程安全,可以加一个thread mutex,代码可以参考这个answer, my complete code contains all the test data, as you can see here.
以下是一位同样关注该问题的朋友的两种工作方法。第一个是高效的并使用模运算符,因此列表不需要复制到自身。
month_names = [
'January', 'February',
'March', 'April', 'May',
'June', 'July', 'August',
'September', 'October',
'November', 'December'
]
# Looks like: {'January': 0, 'February': 1...}
month_name_to_index = {
value: index
for index, value
in enumerate(month_names)
}
def consecutive_months(list_of_months_by_name):
if not list_of_months_by_name:
# If the list is empty, return None.
return None
month_was_seen = [False] * 12 # Looks like: [False, False, ...]
for month_name in list_of_months_by_name:
month_was_seen[month_name_to_index[month_name]] = True
# Seek to first missing month:
for start_index in range(12):
if not month_was_seen[start_index]:
break
# If there is no missing month, return the whole year.
if month_was_seen[start_index]:
return {"from": "January", "to": "December", "length": 12}
# Starting from the first missing month, loop around the year
# and keep track of the longest run using one boolean and four
# integers.
running = False
longest_run_index = None
longest_run_length = 0
current_run_index = None
current_run_length = None
for offset in range(1, 13):
index = (start_index + offset) % 12
if month_was_seen[index]:
# Continue a run or begin a run.
if running:
current_run_length += 1
continue
running = True
current_run_index = index
current_run_length = 1
continue
if running:
# End the run.
running = False
if current_run_length > longest_run_length:
longest_run_index = current_run_index
longest_run_length = current_run_length
return {
"from": month_names[longest_run_index],
"to": month_names[(longest_run_index + longest_run_length - 1) % 12],
"length": longest_run_length
}
第二个是巧妙的单行:
MONTH_NAMES = [
'January', 'February',
'March', 'April', 'May',
'June', 'July', 'August',
'September', 'October',
'November', 'December'
]
def consecutive_months(list_of_months_by_name):
return max(
(
len(segment)-segment.index(":")-1,
(MONTH_NAMES*2)[
int(segment[:segment.index(":")])+1
:
int(segment[:segment.index(":")]) + len(segment) - segment.index(":")
]
)
for segment in
"".join([
"x" if month_name in list_of_months_by_name else f",{index}:"
for index, month_name in enumerate(MONTH_NAMES*2)
]).split(",")
if ":" in segment
)[1] if set(MONTH_NAMES) - set(list_of_months_by_name) else MONTH_NAMES
两种算法return 测试数据的预期结果。感谢 AV!
我有兴趣在无序列表中找到最大的月集,它可以作为 不同的、连续的 月的有序列表返回。
例如:
consecutive_months(["December", "January", "February", "April"])
会输出:
"December", "January", "February"
并且:
consecutive_months(["February", "December", "January"])
会输出:
"December", "January", "February"
下面的方法有效,但我很好奇是否有人对更优雅的方法有想法:
MONTHS = ["January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"]
def consecutive_months(lst_of_months):
# create two years of months to handle going from Dec to Jan
results = []
for m in set(lst_of_months):
results.append((m,MONTHS.index(m)))
results.append((m,MONTHS.index(m)+12))
results = sorted(results, key=lambda x: x[1])
# find the longest series of consecutive months
this_series = []
longest_series = []
for this_month in results:
if len(this_series) > 0:
last_month = this_series[-1]
if last_month[1] + 1 == this_month[1]:
this_series.append(this_month)
else:
this_series = [this_month]
else:
this_series = [this_month]
if len(this_series) > len(longest_series):
longest_series = [m for (m,i) in this_series]
return longest_series
Here 是一个带有示例输入和预期输出的 pastebin。
我注意到您的代码存在一个问题:当所有 12 个月都出现在输入中时,输出会列出所有月份两次。这很容易解决,只需执行以下操作:
return longest_series[:12]
我会寻求一种解决方案,将输入转换为一种 12 位的“位图”,其中 1 表示相应的月份在输入中,而 0 表示不是。
如果表示为12个字符的字符串,您可以使用正则表达式轻松识别“1”的序列。
我还会对月份列表进行一些预处理,这样你们就有了列表和它的字典版本,并且列表加倍了,这样你就可以从它切入 12 边界。
这是建议的代码:
import re
months = ["January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"]
# Also create a dictionary to get a month's index
month_nums = { month: num for num, month in enumerate(months) }
# ... and double the months list, to ease slicing across the 12-boundary
months += months
def consecutive_months(given_months):
# Deal with boundary case
if not given_months:
return []
# Convert input to 12 bits in string format
lst = ["0"] * 12
for m in given_months:
lst[month_nums[m]] = "1"
bits = "".join(lst)
# Identify the longest chunk of consecutive "1" in that doubled string
_, start, end = max((j-i, i, j)
for i, j in (match.span(0)
for match in re.finditer("1+", bits + bits)
)
)
# Using the found span, extract the corresponding month names
return months[start:end][:12]
月份字符串只是一个符号,其实质还是后面对应的数字,从1到12,一个月又一个月。
两个月的字符串不能直接比较。换算成数字的话,下个月的数字可以加1(12月之后的1月除外), 而且数字之间的比较肯定大于字符串。
我优化后的代码如下:
MONTHS = ["January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"]
month_num_dict = {month: num for num, month in enumerate(MONTHS, start=1)}
def consecutive_months(month_list: list) -> list:
# Deal with boundary case
if len(month_list) == 0:
return month_list
# A list of twice length is required only when the first and last months end to end
first_month_num = month_num_dict[month_list[0]]
last_month_num = month_num_dict[month_list[-1]]
last_month_next_num = last_month_num + 1 if last_month_num != 12 else 1
month_list = month_list * 2 if last_month_next_num == first_month_num else month_list
# Initialize list of candidates and longest series
candidate = [month_list[0], ]
longest_series = [month_list[0], ]
for i in range(len(month_list) - 1):
month = month_list[i]
month_num = month_num_dict[month]
next_month = month_list[i + 1]
next_month_num = month_num_dict[next_month]
expected_next_month_num = month_num + 1 if month_num != 12 else 1
if expected_next_month_num == next_month_num:
candidate.append(next_month)
# At the end of the traversal, decide whether to update the longest series
# according to the length of the candidate.
if i == len(month_list) - 2 and len(candidate) > len(longest_series):
longest_series = candidate
else:
# When the length of the new candidate is greater than the old, update the longest series
if len(candidate) > len(longest_series):
longest_series = candidate
# Generate next candidate month list
candidate = [next_month, ]
# Deal with all 12 months input list
if len(longest_series) > 12:
return MONTHS
return longest_series
如果担心手写的MONTHS
列表有误,也可以通过time.strftime
获取:
import time
import locale
locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
month_num_dict = {
time.strftime("%B", time.strptime(str(num), "%m")): num
for num in range(1, 13)
}
MONTHS = list(month_num_dict.keys())
当然,为了设置回原来的locale,保证线程安全,可以加一个thread mutex,代码可以参考这个answer, my complete code contains all the test data, as you can see here.
以下是一位同样关注该问题的朋友的两种工作方法。第一个是高效的并使用模运算符,因此列表不需要复制到自身。
month_names = [
'January', 'February',
'March', 'April', 'May',
'June', 'July', 'August',
'September', 'October',
'November', 'December'
]
# Looks like: {'January': 0, 'February': 1...}
month_name_to_index = {
value: index
for index, value
in enumerate(month_names)
}
def consecutive_months(list_of_months_by_name):
if not list_of_months_by_name:
# If the list is empty, return None.
return None
month_was_seen = [False] * 12 # Looks like: [False, False, ...]
for month_name in list_of_months_by_name:
month_was_seen[month_name_to_index[month_name]] = True
# Seek to first missing month:
for start_index in range(12):
if not month_was_seen[start_index]:
break
# If there is no missing month, return the whole year.
if month_was_seen[start_index]:
return {"from": "January", "to": "December", "length": 12}
# Starting from the first missing month, loop around the year
# and keep track of the longest run using one boolean and four
# integers.
running = False
longest_run_index = None
longest_run_length = 0
current_run_index = None
current_run_length = None
for offset in range(1, 13):
index = (start_index + offset) % 12
if month_was_seen[index]:
# Continue a run or begin a run.
if running:
current_run_length += 1
continue
running = True
current_run_index = index
current_run_length = 1
continue
if running:
# End the run.
running = False
if current_run_length > longest_run_length:
longest_run_index = current_run_index
longest_run_length = current_run_length
return {
"from": month_names[longest_run_index],
"to": month_names[(longest_run_index + longest_run_length - 1) % 12],
"length": longest_run_length
}
第二个是巧妙的单行:
MONTH_NAMES = [
'January', 'February',
'March', 'April', 'May',
'June', 'July', 'August',
'September', 'October',
'November', 'December'
]
def consecutive_months(list_of_months_by_name):
return max(
(
len(segment)-segment.index(":")-1,
(MONTH_NAMES*2)[
int(segment[:segment.index(":")])+1
:
int(segment[:segment.index(":")]) + len(segment) - segment.index(":")
]
)
for segment in
"".join([
"x" if month_name in list_of_months_by_name else f",{index}:"
for index, month_name in enumerate(MONTH_NAMES*2)
]).split(",")
if ":" in segment
)[1] if set(MONTH_NAMES) - set(list_of_months_by_name) else MONTH_NAMES
两种算法return 测试数据的预期结果。感谢 AV!