re.sub 在 Python3 而不是 Python2 中的问题
Issue with re.sub in Python3 and not in Python2
我在 Python 2.7 中有一个旧脚本可以正确运行 re.sub 进程。但是,当我尝试在 Python 3 中使用它时,我得到 TypeError: expected string or bytes-like object
相关代码为
substitution_array=[
[r"^Map From GroupLayer","Add Map GroupLayer"],[r"^Map From","Add Map Auto Layer"]
,[r"^\s+Papersize\s+.*",""],[r"^Set Window.*",""],[r"^Open Window.*",""]]
for row in substitution_array:
print(row[0])
for x in newfile:
line = re.sub(row[0],row[1],x)
line2=filter(line.strip, line)
newfile2.append(line2)
print ("Finished: "+row[0])
newfile=newfile2
newfile2=[]
我得到以下输出
G:\GIS_Tables\Vector_Data\Administrative\Cadastre\Road_Reserves>python3 Create_MB_from_WOR.py
--- Table Name: Road_Reserves
^Map From GroupLayer
Finished: ^Map From GroupLayer
^Map From
Traceback (most recent call last):
File "Create_MB_from_WOR.py", line 43, in <module>
line = re.sub(row[0],row[1],x)
File "C:\OSGeo4W64\apps\Python37\lib\re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
所以它在 ,[r"^Map From","Add Map Auto Layer"]
上失败了,当我删除它时它在下一个上也失败了。
我查看了 https://docs.python.org/3/library/re.html 并认为我已经正确地转义了,但是这里有什么问题吗?
Python 2.7 中相同数据的相同代码 运行
我不确定你脚本中这一行的意图
line2=filter(line.strip, line)
但区别在于 filter
的行为:
Python 2
filter(function, iterable)
Construct a list from those elements of iterable for which function returns true.
If iterable is a string or a tuple, the result also has that type
Python 3
filter(function, iterable)
Construct an iterator from those elements of iterable for which function returns true.
在您的脚本中,在 Python 2 filter
returns 字符串中。但是在 Python 3 filter
returns 一个 <filter object>
可迭代对象中,这会导致 re.sub
崩溃,因为 <filter object>
不是字符串或类字节对象。
Python 3 中的等价物是
line2=''.join(filter(line.strip, line))
您没有提供可重现的示例,但我通过以下方式重现了错误:
import re
newfile = ['a'] # wasn't defined, assuming a list of strings
newfile2 = [] # wasn't defined, assuming a list
substitution_array=[
[r"^Map From GroupLayer","Add Map GroupLayer"],[r"^Map From","Add Map Auto Layer"]
,[r"^\s+Papersize\s+.*",""],[r"^Set Window.*",""],[r"^Open Window.*",""]]
for row in substitution_array:
print(row[0])
for x in newfile:
print(f'{x=}')
line = re.sub(row[0],row[1],x)
line2=filter(line.strip, line)
print(f'{line2=}')
newfile2.append(line2)
print(f'{newfile2=}')
print ("Finished: "+row[0])
newfile=newfile2
newfile2=[]
print(f'{newfile=} {newfile2=}')
输出(已添加注释):
^Map From GroupLayer
x='a' # x is a string
line2=<filter object at 0x000001E3D5BAAE50> # filter() returns a iterable object in Python 3
newfile2=[<filter object at 0x000001E3D5BAAE50>] # newfile gets this object
Finished: ^Map From GroupLayer
newfile=[<filter object at 0x000001E3D5BAAE50>] newfile2=[]
^Map From
x=<filter object at 0x000001E3D5BAAE50> # NEXT ITERATION, x is that filter object
Traceback (most recent call last):
File "C:\Users\metolone\test.py", line 14, in <module>
line = re.sub(row[0],row[1],x) # then re.sub complains about it
File "D:\dev\Python39\lib\re.py", line 210, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
你认为 line2 = filter(line.strip,line)
是做什么的?这就是说“删除行中 line.strip(x) 对于 each_character 为真的字符”。在这种情况下,例如 line.strip(' ')
只有 return false 如果 all 行中的字符是空格,所以如果行中有任何变化,它将保留不变,任何具有相同字符的行都将被消隐。对于长度为 x 的行,过滤器函数也会被调用 x 次,这也是低效的。来自 Python 2 的示例:
>>> line = ' \n a ' # variation, no change
>>> filter(line.strip,line)
' \n a '
>>> line = ' ' # all spaces, blanks the line
>>> filter(line.strip,line)
''
>>> line = ' \n ' # different kinds of whitespace, no change
>>> filter(line.strip,line)
' \n '
>>> line = '\n\n\n\n\n' # all same newline, blanks line
>>> filter(line.strip,line)
''
>>> line = '\n\n \n\n' # different kinds of whitespace, no change
>>> filter(line.strip,line)
'\n\n \n\n'
>>> line = 'aaaaaaaaaaaaaaaa' # no variation, blanks the line
>>> filter(line.strip,line)
''
所以这看起来像是一个错误,您可能想说明您认为这应该做什么,我们可以推荐一个更好的方法来做到这一点。
我在 Python 2.7 中有一个旧脚本可以正确运行 re.sub 进程。但是,当我尝试在 Python 3 中使用它时,我得到 TypeError: expected string or bytes-like object
相关代码为
substitution_array=[
[r"^Map From GroupLayer","Add Map GroupLayer"],[r"^Map From","Add Map Auto Layer"]
,[r"^\s+Papersize\s+.*",""],[r"^Set Window.*",""],[r"^Open Window.*",""]]
for row in substitution_array:
print(row[0])
for x in newfile:
line = re.sub(row[0],row[1],x)
line2=filter(line.strip, line)
newfile2.append(line2)
print ("Finished: "+row[0])
newfile=newfile2
newfile2=[]
我得到以下输出
G:\GIS_Tables\Vector_Data\Administrative\Cadastre\Road_Reserves>python3 Create_MB_from_WOR.py
--- Table Name: Road_Reserves
^Map From GroupLayer
Finished: ^Map From GroupLayer
^Map From
Traceback (most recent call last):
File "Create_MB_from_WOR.py", line 43, in <module>
line = re.sub(row[0],row[1],x)
File "C:\OSGeo4W64\apps\Python37\lib\re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
所以它在 ,[r"^Map From","Add Map Auto Layer"]
上失败了,当我删除它时它在下一个上也失败了。
我查看了 https://docs.python.org/3/library/re.html 并认为我已经正确地转义了,但是这里有什么问题吗?
Python 2.7 中相同数据的相同代码 运行
我不确定你脚本中这一行的意图
line2=filter(line.strip, line)
但区别在于 filter
的行为:
Python 2
filter(function, iterable)
Construct a list from those elements of iterable for which function returns true.
If iterable is a string or a tuple, the result also has that type
Python 3
filter(function, iterable)
Construct an iterator from those elements of iterable for which function returns true.
在您的脚本中,在 Python 2 filter
returns 字符串中。但是在 Python 3 filter
returns 一个 <filter object>
可迭代对象中,这会导致 re.sub
崩溃,因为 <filter object>
不是字符串或类字节对象。
Python 3 中的等价物是
line2=''.join(filter(line.strip, line))
您没有提供可重现的示例,但我通过以下方式重现了错误:
import re
newfile = ['a'] # wasn't defined, assuming a list of strings
newfile2 = [] # wasn't defined, assuming a list
substitution_array=[
[r"^Map From GroupLayer","Add Map GroupLayer"],[r"^Map From","Add Map Auto Layer"]
,[r"^\s+Papersize\s+.*",""],[r"^Set Window.*",""],[r"^Open Window.*",""]]
for row in substitution_array:
print(row[0])
for x in newfile:
print(f'{x=}')
line = re.sub(row[0],row[1],x)
line2=filter(line.strip, line)
print(f'{line2=}')
newfile2.append(line2)
print(f'{newfile2=}')
print ("Finished: "+row[0])
newfile=newfile2
newfile2=[]
print(f'{newfile=} {newfile2=}')
输出(已添加注释):
^Map From GroupLayer
x='a' # x is a string
line2=<filter object at 0x000001E3D5BAAE50> # filter() returns a iterable object in Python 3
newfile2=[<filter object at 0x000001E3D5BAAE50>] # newfile gets this object
Finished: ^Map From GroupLayer
newfile=[<filter object at 0x000001E3D5BAAE50>] newfile2=[]
^Map From
x=<filter object at 0x000001E3D5BAAE50> # NEXT ITERATION, x is that filter object
Traceback (most recent call last):
File "C:\Users\metolone\test.py", line 14, in <module>
line = re.sub(row[0],row[1],x) # then re.sub complains about it
File "D:\dev\Python39\lib\re.py", line 210, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
你认为 line2 = filter(line.strip,line)
是做什么的?这就是说“删除行中 line.strip(x) 对于 each_character 为真的字符”。在这种情况下,例如 line.strip(' ')
只有 return false 如果 all 行中的字符是空格,所以如果行中有任何变化,它将保留不变,任何具有相同字符的行都将被消隐。对于长度为 x 的行,过滤器函数也会被调用 x 次,这也是低效的。来自 Python 2 的示例:
>>> line = ' \n a ' # variation, no change
>>> filter(line.strip,line)
' \n a '
>>> line = ' ' # all spaces, blanks the line
>>> filter(line.strip,line)
''
>>> line = ' \n ' # different kinds of whitespace, no change
>>> filter(line.strip,line)
' \n '
>>> line = '\n\n\n\n\n' # all same newline, blanks line
>>> filter(line.strip,line)
''
>>> line = '\n\n \n\n' # different kinds of whitespace, no change
>>> filter(line.strip,line)
'\n\n \n\n'
>>> line = 'aaaaaaaaaaaaaaaa' # no variation, blanks the line
>>> filter(line.strip,line)
''
所以这看起来像是一个错误,您可能想说明您认为这应该做什么,我们可以推荐一个更好的方法来做到这一点。