我的代码混淆了正则表达式的输入文件名
My code is confusing an input file name for a regex expression
我的正则表达式没有在字符范围内明确包含破折号,但是当输入文件名如下时,我的代码失败了:
Rage Against The Machine - 1996 - Bulls On Parade [Maxi-Single]
这是我的代码:
def find_cue_files(path):
found_files = []
for root, dirs, files in os.walk(path):
if files:
fcue = glob(os.path.join(root, '*.[Cc][Uu][Ee]')) # this is line 81 in my source file (mentioned in the traceback)
# do a few other things...
return found_files
很明显文件名的这一部分是问题所在:[Maxi-Single]
如何处理类似的文件名,以便将它们视为固定字符串,而不是正则表达式的一部分?
(这不是我的主要问题,但如果它是相关的,我愿意尝试另一种方法来进行不区分大小写的搜索。我已经查看了关于该主题的几个 Stack Overflow 问题,但我没有-- 到目前为止 -- 找到似乎适合这种情况的任何解决方案。)
这是我的错误:
回溯(最近调用最后):
File "/usr/bin/xonsh", line 33, in <module>
sys.exit(load_entry_point('xonsh==0.10.0', 'console_scripts', 'xonsh')())
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21336, in main
_failback_to_other_shells(args, err)
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21283, in _failback_to_other_shells
raise err
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21334, in main
sys.exit(main_xonsh(args))
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21388, in main_xonsh
run_script_with_cache(
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 3285, in run_script_with_cache
run_compiled_code(ccode, glb, loc, mode)
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 3190, in run_compiled_code
func(code, glb, loc)
File "process_audio_files.xsh", line 160, in <module>
cue_files = find_cue_files(dest_path)
File "process_audio_files.xsh", line 81, in find_cue_files
fcue = glob(os.path.join(root, '*.[Cc][Uu][Ee]'))
File "/usr/lib/python3.9/glob.py", line 22, in glob
return list(iglob(pathname, recursive=recursive))
File "/usr/lib/python3.9/glob.py", line 74, in _iglob
for dirname in dirs:
File "/usr/lib/python3.9/glob.py", line 75, in _iglob
for name in glob_in_dir(dirname, basename, dironly):
File "/usr/lib/python3.9/glob.py", line 86, in _glob1
return fnmatch.filter(names, pattern)
File "/usr/lib/python3.9/fnmatch.py", line 58, in filter
match = _compile_pattern(pat)
File "/usr/lib/python3.9/fnmatch.py", line 52, in _compile_pattern
return re.compile(res).match
File "/usr/lib/python3.9/re.py", line 252, in compile
return _compile(pattern, flags)
File "/usr/lib/python3.9/re.py", line 304, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.9/sre_compile.py", line 764, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.9/sre_parse.py", line 948, in parse
p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/lib/python3.9/sre_parse.py", line 443, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
File "/usr/lib/python3.9/sre_parse.py", line 834, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "/usr/lib/python3.9/sre_parse.py", line 443, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
File "/usr/lib/python3.9/sre_parse.py", line 598, in _parse
raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range i-S at position 70
编辑:我尝试使用此处引用的 re.escape
:https://docs.python.org/3/library/re.html
def find_cue_files(path):
found_files = []
for root, dirs, files in os.walk(path):
if files:
root2 = re.escape(root)
fcue = glob(os.path.join(root2, '*.[Cc][Uu][Ee]'))
# do a few other things...
return found_files
它处理了较早的文件名,但现在输入文件名失败 Aerosmith - Aerosmith (2014) [24-96 HD]
在我修改后的代码中的同一点产生相同的错误。
与其使用带有通过根传递的有趣文件模式的 glob,不如只整理名称,然后在根前面添加。一种可能的单行:
fcue=list(map(lambda x: os.path.join(root,x), (f for f in files if f.lower().endswith('.cue'))))
我的正则表达式没有在字符范围内明确包含破折号,但是当输入文件名如下时,我的代码失败了:
Rage Against The Machine - 1996 - Bulls On Parade [Maxi-Single]
这是我的代码:
def find_cue_files(path):
found_files = []
for root, dirs, files in os.walk(path):
if files:
fcue = glob(os.path.join(root, '*.[Cc][Uu][Ee]')) # this is line 81 in my source file (mentioned in the traceback)
# do a few other things...
return found_files
很明显文件名的这一部分是问题所在:[Maxi-Single]
如何处理类似的文件名,以便将它们视为固定字符串,而不是正则表达式的一部分?
(这不是我的主要问题,但如果它是相关的,我愿意尝试另一种方法来进行不区分大小写的搜索。我已经查看了关于该主题的几个 Stack Overflow 问题,但我没有-- 到目前为止 -- 找到似乎适合这种情况的任何解决方案。)
这是我的错误:
回溯(最近调用最后):
File "/usr/bin/xonsh", line 33, in <module>
sys.exit(load_entry_point('xonsh==0.10.0', 'console_scripts', 'xonsh')())
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21336, in main
_failback_to_other_shells(args, err)
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21283, in _failback_to_other_shells
raise err
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21334, in main
sys.exit(main_xonsh(args))
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21388, in main_xonsh
run_script_with_cache(
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 3285, in run_script_with_cache
run_compiled_code(ccode, glb, loc, mode)
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 3190, in run_compiled_code
func(code, glb, loc)
File "process_audio_files.xsh", line 160, in <module>
cue_files = find_cue_files(dest_path)
File "process_audio_files.xsh", line 81, in find_cue_files
fcue = glob(os.path.join(root, '*.[Cc][Uu][Ee]'))
File "/usr/lib/python3.9/glob.py", line 22, in glob
return list(iglob(pathname, recursive=recursive))
File "/usr/lib/python3.9/glob.py", line 74, in _iglob
for dirname in dirs:
File "/usr/lib/python3.9/glob.py", line 75, in _iglob
for name in glob_in_dir(dirname, basename, dironly):
File "/usr/lib/python3.9/glob.py", line 86, in _glob1
return fnmatch.filter(names, pattern)
File "/usr/lib/python3.9/fnmatch.py", line 58, in filter
match = _compile_pattern(pat)
File "/usr/lib/python3.9/fnmatch.py", line 52, in _compile_pattern
return re.compile(res).match
File "/usr/lib/python3.9/re.py", line 252, in compile
return _compile(pattern, flags)
File "/usr/lib/python3.9/re.py", line 304, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.9/sre_compile.py", line 764, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.9/sre_parse.py", line 948, in parse
p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/lib/python3.9/sre_parse.py", line 443, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
File "/usr/lib/python3.9/sre_parse.py", line 834, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "/usr/lib/python3.9/sre_parse.py", line 443, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
File "/usr/lib/python3.9/sre_parse.py", line 598, in _parse
raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range i-S at position 70
编辑:我尝试使用此处引用的 re.escape
:https://docs.python.org/3/library/re.html
def find_cue_files(path):
found_files = []
for root, dirs, files in os.walk(path):
if files:
root2 = re.escape(root)
fcue = glob(os.path.join(root2, '*.[Cc][Uu][Ee]'))
# do a few other things...
return found_files
它处理了较早的文件名,但现在输入文件名失败 Aerosmith - Aerosmith (2014) [24-96 HD]
在我修改后的代码中的同一点产生相同的错误。
与其使用带有通过根传递的有趣文件模式的 glob,不如只整理名称,然后在根前面添加。一种可能的单行:
fcue=list(map(lambda x: os.path.join(root,x), (f for f in files if f.lower().endswith('.cue'))))