re.sub 在 Python3 而不是 Python2 中的问题

Issue with re.sub in Python3 and not in Python2

我在 Python 2.7 中有一个旧脚本可以正确运行 re.sub 进程。但是,当我尝试在 Python 3 中使用它时,我得到 TypeError: expected string or bytes-like object

相关代码为

substitution_array=[
    [r"^Map From GroupLayer","Add Map GroupLayer"],[r"^Map From","Add Map Auto Layer"]
    ,[r"^\s+Papersize\s+.*",""],[r"^Set Window.*",""],[r"^Open Window.*",""]]

for row in substitution_array:
        print(row[0])
        for x in newfile:
          line = re.sub(row[0],row[1],x)
          line2=filter(line.strip, line)
          newfile2.append(line2)
        print ("Finished: "+row[0])
        newfile=newfile2
        newfile2=[]

我得到以下输出

G:\GIS_Tables\Vector_Data\Administrative\Cadastre\Road_Reserves>python3 Create_MB_from_WOR.py
--- Table Name: Road_Reserves
^Map From GroupLayer
Finished: ^Map From GroupLayer
^Map From
Traceback (most recent call last):
  File "Create_MB_from_WOR.py", line 43, in <module>
    line = re.sub(row[0],row[1],x)
  File "C:\OSGeo4W64\apps\Python37\lib\re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

所以它在 ,[r"^Map From","Add Map Auto Layer"] 上失败了,当我删除它时它在下一个上也失败了。

我查看了 https://docs.python.org/3/library/re.html 并认为我已经正确地转义了,但是这里有什么问题吗?

Python 2.7 中相同数据的相同代码 运行

我不确定你脚本中这一行的意图

line2=filter(line.strip, line)

但区别在于 filter 的行为:

Python 2
filter(function, iterable)
Construct a list from those elements of iterable for which function returns true.
If iterable is a string or a tuple, the result also has that type
Python 3
filter(function, iterable)
Construct an iterator from those elements of iterable for which function returns true.

在您的脚本中,在 Python 2 filter returns 字符串中。但是在 Python 3 filter returns 一个 <filter object> 可迭代对象中,这会导致 re.sub 崩溃,因为 <filter object> 不是字符串或类字节对象。

Python 3 中的等价物是

line2=''.join(filter(line.strip, line))

您没有提供可重现的示例,但我通过以下方式重现了错误:

import re

newfile = ['a']  # wasn't defined, assuming a list of strings
newfile2 = []        # wasn't defined, assuming a list

substitution_array=[
    [r"^Map From GroupLayer","Add Map GroupLayer"],[r"^Map From","Add Map Auto Layer"]
    ,[r"^\s+Papersize\s+.*",""],[r"^Set Window.*",""],[r"^Open Window.*",""]]

for row in substitution_array:
        print(row[0])
        for x in newfile:
          print(f'{x=}')
          line = re.sub(row[0],row[1],x)
          line2=filter(line.strip, line)
          print(f'{line2=}')
          newfile2.append(line2)
          print(f'{newfile2=}')
        print ("Finished: "+row[0])
        newfile=newfile2
        newfile2=[]
        print(f'{newfile=} {newfile2=}')

输出(已添加注释):

^Map From GroupLayer
x='a'     # x is a string
line2=<filter object at 0x000001E3D5BAAE50> # filter() returns a iterable object in Python 3
newfile2=[<filter object at 0x000001E3D5BAAE50>] # newfile gets this object
Finished: ^Map From GroupLayer
newfile=[<filter object at 0x000001E3D5BAAE50>] newfile2=[]
^Map From
x=<filter object at 0x000001E3D5BAAE50>  # NEXT ITERATION, x is that filter object
Traceback (most recent call last):
  File "C:\Users\metolone\test.py", line 14, in <module>
    line = re.sub(row[0],row[1],x)    # then re.sub complains about it
  File "D:\dev\Python39\lib\re.py", line 210, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

你认为 line2 = filter(line.strip,line) 是做什么的?这就是说“删除行中 line.strip(x) 对于 each_character 为真的字符”。在这种情况下,例如 line.strip(' ') 只有 return false 如果 all 行中的字符是空格,所以如果行中有任何变化,它将保留不变,任何具有相同字符的行都将被消隐。对于长度为 x 的行,过滤器函数也会被调用 x 次,这也是低效的。来自 Python 2 的示例:

>>> line = '  \n  a '          # variation, no change
>>> filter(line.strip,line)
'  \n  a '                     
>>> line = '            '      # all spaces, blanks the line
>>> filter(line.strip,line)
''
>>> line = '   \n     '        # different kinds of whitespace, no change
>>> filter(line.strip,line)
'   \n     '
>>> line = '\n\n\n\n\n'        # all same newline, blanks line
>>> filter(line.strip,line)
''
>>> line = '\n\n \n\n'         # different kinds of whitespace, no change
>>> filter(line.strip,line)
'\n\n \n\n'
>>> line = 'aaaaaaaaaaaaaaaa'  # no variation, blanks the line
>>> filter(line.strip,line)
''

所以这看起来像是一个错误,您可能想说明您认为这应该做什么,我们可以推荐一个更好的方法来做到这一点。