使用数组中的多个分隔符拆分字符串 (Python)
Split string with multiple separators from an array (Python)
给定一个分隔符数组:
columns = ["Name:", "ID:", "Date:", "Building:", "Room:", "Notes:"]
和一些列留空的字符串(并且有随机的白色 space):
input = "Name: JohnID:123:45Date: 8/2/17Building:Room:Notes: i love notes"
我怎样才能得到这个:
["John", "123:45", "8/2/17", "", "", "i love notes"]
我试过简单地删除子字符串以查看我可以从那里去哪里,但我仍然卡住了
import re
input = re.sub(r'|'.join(map(re.escape, columns)), "", input)
使用列表通过在中间插入(.*)
来生成正则表达式,然后使用strip
删除空格:
import re
columns = ["Name:", "ID:", "Date:", "Building:", "Room:", "Notes:"]
s = "Name: JohnID:123:45Date: 8/2/17Building:Room:Notes: i love notes"
result = [x.strip() for x in re.match("".join(map("{}(.*)".format,columns)),s).groups()]
print(result)
产量:
['John', '123:45', '8/2/17', '', '', 'i love notes']
strip
部分可以由正则表达式处理,代价是更复杂的正则表达式,但整体表达式更简单:
result = re.match("".join(map("{}\s*(.*)\s*".format,columns)),s).groups()
更复杂:如果字段数据包含正则表达式特殊字符,我们必须转义它们(这里不是这种情况):
result = re.match("".join(["{}\s*(.*)\s*".format(re.escape(x)) for x in columns]),s).groups()
使用re.split
怎么样?
>>> import re
>>> columns = ["Name:", "ID:", "Date:", "Building:", "Room:", "Notes:"]
>>> i = "Name: JohnID:123:45Date: 8/2/17Building:Room:Notes: i love notes"
>>> re.split('|'.join(map(re.escape, columns)), i)
['', ' John', '123:45', ' 8/2/17', '', '', ' i love notes']
要去掉空格,也要按空格拆分:
>>> re.split(r'\s*' + (r'\s*|\s*'.join(map(re.escape, columns))) + r'\s*', i.strip())
['', 'John', '123:45', '8/2/17', '', '', ' i love notes']
给定一个分隔符数组:
columns = ["Name:", "ID:", "Date:", "Building:", "Room:", "Notes:"]
和一些列留空的字符串(并且有随机的白色 space):
input = "Name: JohnID:123:45Date: 8/2/17Building:Room:Notes: i love notes"
我怎样才能得到这个:
["John", "123:45", "8/2/17", "", "", "i love notes"]
我试过简单地删除子字符串以查看我可以从那里去哪里,但我仍然卡住了
import re
input = re.sub(r'|'.join(map(re.escape, columns)), "", input)
使用列表通过在中间插入(.*)
来生成正则表达式,然后使用strip
删除空格:
import re
columns = ["Name:", "ID:", "Date:", "Building:", "Room:", "Notes:"]
s = "Name: JohnID:123:45Date: 8/2/17Building:Room:Notes: i love notes"
result = [x.strip() for x in re.match("".join(map("{}(.*)".format,columns)),s).groups()]
print(result)
产量:
['John', '123:45', '8/2/17', '', '', 'i love notes']
strip
部分可以由正则表达式处理,代价是更复杂的正则表达式,但整体表达式更简单:
result = re.match("".join(map("{}\s*(.*)\s*".format,columns)),s).groups()
更复杂:如果字段数据包含正则表达式特殊字符,我们必须转义它们(这里不是这种情况):
result = re.match("".join(["{}\s*(.*)\s*".format(re.escape(x)) for x in columns]),s).groups()
使用re.split
怎么样?
>>> import re
>>> columns = ["Name:", "ID:", "Date:", "Building:", "Room:", "Notes:"]
>>> i = "Name: JohnID:123:45Date: 8/2/17Building:Room:Notes: i love notes"
>>> re.split('|'.join(map(re.escape, columns)), i)
['', ' John', '123:45', ' 8/2/17', '', '', ' i love notes']
要去掉空格,也要按空格拆分:
>>> re.split(r'\s*' + (r'\s*|\s*'.join(map(re.escape, columns))) + r'\s*', i.strip())
['', 'John', '123:45', '8/2/17', '', '', ' i love notes']