在 Python 中使用正则表达式替换匹配前的符号
Replace symbol before match using regex in Python
我有这样的字符串:
text1 = ('SOME STRING,99,1234 FIRST STREET,9998887777,ABC')
text2 = ('SOME OTHER STRING,56789 SECOND STREET,6665554444,DEF')
text3 = ('ANOTHER STRING,#88,4321 THIRD STREET,3332221111,GHI')
期望的输出:
SOME STRING 99,1234 FIRST STREET,9998887777,ABC
SOME OTHER STRING,56789 SECOND STREET,6665554444,DEF
ANOTHER STRING #88,4321 THIRD STREET,3332221111,GHI
我的想法:使用正则表达式查找出现的 1-5 位数字,可能前面有一个符号,位于两个逗号之间且后面没有 space 和字母,然后用这个匹配替换而不是前面的逗号。
类似于:
text.replace(r'(,\d{0,5},)','.........')
如果您使用 regex
模块而不是 re
,那么可能:
import regex
str = "ANOTHER STRING,#88,4321 THIRD STREET,3332221111,GHI"
print(regex.sub(r'(?<!^.*,.*),(?=#?\d+,\d+)', ' ', str))
如果您确定前瞻中的模式后没有其他子字符串,您可以使用 re
。
import re
str = "ANOTHER STRING,#88,4321 THIRD STREET,3332221111,GHI"
print(re.sub(r',(?=#?\d+,\d+)', ' ', str))
如果 SOME STRING、SOME OTHER STRING 和 ANOTHER STRING 从不 包含逗号,则更易于阅读:
text1.replace(",", " ", 1)
只是将第一个逗号替换为 space
简单而有效:
my_pattern = r"(,)(\W?\d{0,5},)"
p = re.compile(my_pattern)
p.sub(r" ", text1) # 'SOME STRING 99,1234 FIRST STREET,9998887777,ABC'
p.sub(r" ", text2) # 'SOME OTHER STRING,56789 SECOND STREET,6665554444,DEF'
p.sub(r" ", text3) # 'ANOTHER STRING #88,4321 THIRD STREET,3332221111,GHI'
具有非捕获组和详细编译的辅助模式:
my_pattern = r"""
(?:,) # Non-capturing group for single comma.
(\W?\d{0,5},) # Capture zero or one non-ascii characters, zero to five numbers, and a comma
"""
# re.X compiles multiline regex patterns
p = re.compile(my_pattern, flags = re.X)
# This time we will use to implement the first captured group
p.sub(r" ", text1)
p.sub(r" ", text2)
p.sub(r" ", text3)
我有这样的字符串:
text1 = ('SOME STRING,99,1234 FIRST STREET,9998887777,ABC')
text2 = ('SOME OTHER STRING,56789 SECOND STREET,6665554444,DEF')
text3 = ('ANOTHER STRING,#88,4321 THIRD STREET,3332221111,GHI')
期望的输出:
SOME STRING 99,1234 FIRST STREET,9998887777,ABC
SOME OTHER STRING,56789 SECOND STREET,6665554444,DEF
ANOTHER STRING #88,4321 THIRD STREET,3332221111,GHI
我的想法:使用正则表达式查找出现的 1-5 位数字,可能前面有一个符号,位于两个逗号之间且后面没有 space 和字母,然后用这个匹配替换而不是前面的逗号。 类似于:
text.replace(r'(,\d{0,5},)','.........')
如果您使用 regex
模块而不是 re
,那么可能:
import regex
str = "ANOTHER STRING,#88,4321 THIRD STREET,3332221111,GHI"
print(regex.sub(r'(?<!^.*,.*),(?=#?\d+,\d+)', ' ', str))
如果您确定前瞻中的模式后没有其他子字符串,您可以使用 re
。
import re
str = "ANOTHER STRING,#88,4321 THIRD STREET,3332221111,GHI"
print(re.sub(r',(?=#?\d+,\d+)', ' ', str))
如果 SOME STRING、SOME OTHER STRING 和 ANOTHER STRING 从不 包含逗号,则更易于阅读:
text1.replace(",", " ", 1)
只是将第一个逗号替换为 space
简单而有效:
my_pattern = r"(,)(\W?\d{0,5},)"
p = re.compile(my_pattern)
p.sub(r" ", text1) # 'SOME STRING 99,1234 FIRST STREET,9998887777,ABC'
p.sub(r" ", text2) # 'SOME OTHER STRING,56789 SECOND STREET,6665554444,DEF'
p.sub(r" ", text3) # 'ANOTHER STRING #88,4321 THIRD STREET,3332221111,GHI'
具有非捕获组和详细编译的辅助模式:
my_pattern = r"""
(?:,) # Non-capturing group for single comma.
(\W?\d{0,5},) # Capture zero or one non-ascii characters, zero to five numbers, and a comma
"""
# re.X compiles multiline regex patterns
p = re.compile(my_pattern, flags = re.X)
# This time we will use to implement the first captured group
p.sub(r" ", text1)
p.sub(r" ", text2)
p.sub(r" ", text3)