将 re.sub 与包含垂直线的替换项一起使用时出现问题
Problem using re.sub with a replacement including Vertical lines
我的数据示例行:
12808|08.12.2008|13:44:35|-0.05||||||||0.26|1.53|2.94|0.81|1.75|5.53|79.56||||2|K:\Path\to\File\TE08-08-Chla-12.08.2008.xls|19.01.2009 09:34:57|9|15||
搜索模式和功能:
oldpatdat='[|][0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9][|]'
#Date like |30.12.2009|
oldpatdat='\|{1}[0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9]\|{1}'
#same, works finding, but substituting... nope
oldpatdattim='[0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9]\ {1}[0-5][0-9]\:{1}[0-5][0-9]\:{1}[0-5][0-9]'
#DateTime like 24.10.2010 12:34:56 works, as I don't need the use of Vertical lines
def setdef(line):
print("line in setdef: "+line)
if re.search(oldpatdattim,line):
print("oldpatdattim found")
klem=re.findall(oldpatdattim,line)
print("Klem+time: "+str(klem))
for ele in klem:
print("ele: "+str(ele))
new=ele[6:10]+"-"+ele[3:5]+"-"+ele[0:2]+ele[10:]
print("new: "+new)
line=re.sub(ele,new,line)
print("line after sub1(dattim): "+str(line))
if re.search(oldpatdat,line):
print("oldpatdat found")
klem=re.findall(oldpatdat,line)
print("Klem: "+str(klem))
for ele in klem:
print("ele: "+str(ele))
new="|"+ele[7:11]+"-"+ele[4:6]+"-"+ele[1:3]+"|"
print("new: "+new)
line=re.sub(ele,new,line)
print("line after sub2(dat): "+str(line))
输出:
bash>line in setdef: 12806|08.12.2008|13:43:34|-0.06||||||||0|1.53|3.54|0.36|1.66|5.44|79.59||||2|K:\Path\to\File\TE08-08-Chla-12.08.2008.xls|19.01.2009 09:34:57|9|15||
bash>oldpatdattim found
bash>Klem+time: ['19.01.2009 09:34:57']
bash>ele: 19.01.2009 09:34:57
bash>new: 2009-01-19 09:34:57
bash>line after sub1(dattim): 12806|08.12.2008|13:43:34|-0.06||||||||0|1.53|3.54|0.36|1.66|5.44|79.59||||2|K:\Path\to\File\TE08-08-Chla-12.08.2008.xls|2009-01-19 09:34:57|9|15||
bash>oldpatdat found
bash>Klem: ['|08.12.2008|']
bash>ele: |08.12.2008|
bash>new: |2008-12-08|
bash>line after sub2(dat): |2008-12-08|1|2008-12-08|2|2008-12-08|8|2008-12-08|0|2008-12-08|6|2008-12-08|||2008-12-08||2008-12-08||2008-12-08|||2008-12-08|1|2008-12-08|3|2008-12-08|:|2008-12-08|4|2008-12-08|3|2008-12-08|:|2008-12-08|3|2008-12-08|4|2008-12-08|||2008-12-08|-|2008-12-08|0|2008-12-08|.|2008-12-08|0|2008-12-08|6|2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|0|2008-12-08|||2008-12-08|1|2008-12-08|.|2008-12-08|5|2008-12-08|3|2008-12-08|||2008-12-08|3|2008-12-08|.|2008-12-08|5|2008-12-08|4|2008-12-08|||2008-12-08|0|2008...etc
我正在使用 python 3.8.2,但正则表达式 sub()
有问题。
我有一个 CSV table,日期和时间由垂直线分隔。由于我需要为我的数据库获取正确格式的日期和时间,我尝试使用正则表达式来查找和处理它们。
因为我不想编辑文件名中的日期,所以我在行中搜索 '|30.12.2009|'
之类的日期,以确保它是 table 列而不是文件名的一部分.所以我试图用 '|2009-12-30|'
替换它。第一个 if
子句按预期工作,因为 search/replace 模式中没有垂直线。我遇到的问题是 re.sub()
。我正在寻找的元素已找到,新字符串已准备就绪,但使用 re.sub()
会压缩该行每个字符之间新准备的日期。
逃避垂直线对我来说没有用,我这样做的方式。事实上它确实如此,但仅用于查找。
我对此有点困惑,到目前为止还没有找到任何解决方案。
如何让我的 re.sub()
做正确的事?
试试这个
import re
oldpatdat='[|][0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9][|]'
#Date like |30.12.2009|
oldpatdat='\|{1}[0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9]\|{1}'
#same, works finding, but substituting... nope
oldpatdattim='[0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9]\ {1}[0-5][0-9]\:{1}[0-5][0-9]\:{1}[0-5][0-9]'
#DateTime like 24.10.2010 12:34:56 works, as I don't need the use of Vertical lines
def setdef(line):
print("line in setdef: "+line)
if re.search(oldpatdattim,line):
print("oldpatdattim found")
klem=re.findall(oldpatdattim,line)
print("Klem+time: "+str(klem))
for ele in klem:
print("ele: "+str(ele))
new=ele[6:10]+"-"+ele[3:5]+"-"+ele[0:2]+ele[10:]
print("new: "+new)
# for robustness
line=re.sub(ele.replace('.', '\.').replace('|', '\|'),new,line)
print("line after sub1(dattim): "+str(line))
if re.search(oldpatdat,line):
print("oldpatdat found")
klem=re.findall(oldpatdat,line)
print("Klem: "+str(klem))
for ele in klem:
print("ele: "+str(ele))
new="|"+ele[7:11]+"-"+ele[4:6]+"-"+ele[1:3]+"|"
print("new: "+new)
line=re.sub(ele.replace('.', '\.').replace('|', '\|'),new,line)
# or you could use line=line.replace(ele, new)
print("line after sub2(dat): "+str(line))
setdef('12808|08.12.2008|13:44:35|-0.05||||||||0.26|1.53|2.94|0.81|1.75|5.53|79.56||||2|K:\Path\to\File\TE08-08-Chla-12.08.2008.xls|19.01.2009 09:34:57|9|15||')
我的数据示例行:
12808|08.12.2008|13:44:35|-0.05||||||||0.26|1.53|2.94|0.81|1.75|5.53|79.56||||2|K:\Path\to\File\TE08-08-Chla-12.08.2008.xls|19.01.2009 09:34:57|9|15||
搜索模式和功能:
oldpatdat='[|][0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9][|]'
#Date like |30.12.2009|
oldpatdat='\|{1}[0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9]\|{1}'
#same, works finding, but substituting... nope
oldpatdattim='[0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9]\ {1}[0-5][0-9]\:{1}[0-5][0-9]\:{1}[0-5][0-9]'
#DateTime like 24.10.2010 12:34:56 works, as I don't need the use of Vertical lines
def setdef(line):
print("line in setdef: "+line)
if re.search(oldpatdattim,line):
print("oldpatdattim found")
klem=re.findall(oldpatdattim,line)
print("Klem+time: "+str(klem))
for ele in klem:
print("ele: "+str(ele))
new=ele[6:10]+"-"+ele[3:5]+"-"+ele[0:2]+ele[10:]
print("new: "+new)
line=re.sub(ele,new,line)
print("line after sub1(dattim): "+str(line))
if re.search(oldpatdat,line):
print("oldpatdat found")
klem=re.findall(oldpatdat,line)
print("Klem: "+str(klem))
for ele in klem:
print("ele: "+str(ele))
new="|"+ele[7:11]+"-"+ele[4:6]+"-"+ele[1:3]+"|"
print("new: "+new)
line=re.sub(ele,new,line)
print("line after sub2(dat): "+str(line))
输出:
bash>line in setdef: 12806|08.12.2008|13:43:34|-0.06||||||||0|1.53|3.54|0.36|1.66|5.44|79.59||||2|K:\Path\to\File\TE08-08-Chla-12.08.2008.xls|19.01.2009 09:34:57|9|15||
bash>oldpatdattim found
bash>Klem+time: ['19.01.2009 09:34:57']
bash>ele: 19.01.2009 09:34:57
bash>new: 2009-01-19 09:34:57
bash>line after sub1(dattim): 12806|08.12.2008|13:43:34|-0.06||||||||0|1.53|3.54|0.36|1.66|5.44|79.59||||2|K:\Path\to\File\TE08-08-Chla-12.08.2008.xls|2009-01-19 09:34:57|9|15||
bash>oldpatdat found
bash>Klem: ['|08.12.2008|']
bash>ele: |08.12.2008|
bash>new: |2008-12-08|
bash>line after sub2(dat): |2008-12-08|1|2008-12-08|2|2008-12-08|8|2008-12-08|0|2008-12-08|6|2008-12-08|||2008-12-08||2008-12-08||2008-12-08|||2008-12-08|1|2008-12-08|3|2008-12-08|:|2008-12-08|4|2008-12-08|3|2008-12-08|:|2008-12-08|3|2008-12-08|4|2008-12-08|||2008-12-08|-|2008-12-08|0|2008-12-08|.|2008-12-08|0|2008-12-08|6|2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|0|2008-12-08|||2008-12-08|1|2008-12-08|.|2008-12-08|5|2008-12-08|3|2008-12-08|||2008-12-08|3|2008-12-08|.|2008-12-08|5|2008-12-08|4|2008-12-08|||2008-12-08|0|2008...etc
我正在使用 python 3.8.2,但正则表达式 sub()
有问题。
我有一个 CSV table,日期和时间由垂直线分隔。由于我需要为我的数据库获取正确格式的日期和时间,我尝试使用正则表达式来查找和处理它们。
因为我不想编辑文件名中的日期,所以我在行中搜索 '|30.12.2009|'
之类的日期,以确保它是 table 列而不是文件名的一部分.所以我试图用 '|2009-12-30|'
替换它。第一个 if
子句按预期工作,因为 search/replace 模式中没有垂直线。我遇到的问题是 re.sub()
。我正在寻找的元素已找到,新字符串已准备就绪,但使用 re.sub()
会压缩该行每个字符之间新准备的日期。
逃避垂直线对我来说没有用,我这样做的方式。事实上它确实如此,但仅用于查找。
我对此有点困惑,到目前为止还没有找到任何解决方案。
如何让我的 re.sub()
做正确的事?
试试这个
import re
oldpatdat='[|][0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9][|]'
#Date like |30.12.2009|
oldpatdat='\|{1}[0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9]\|{1}'
#same, works finding, but substituting... nope
oldpatdattim='[0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9]\ {1}[0-5][0-9]\:{1}[0-5][0-9]\:{1}[0-5][0-9]'
#DateTime like 24.10.2010 12:34:56 works, as I don't need the use of Vertical lines
def setdef(line):
print("line in setdef: "+line)
if re.search(oldpatdattim,line):
print("oldpatdattim found")
klem=re.findall(oldpatdattim,line)
print("Klem+time: "+str(klem))
for ele in klem:
print("ele: "+str(ele))
new=ele[6:10]+"-"+ele[3:5]+"-"+ele[0:2]+ele[10:]
print("new: "+new)
# for robustness
line=re.sub(ele.replace('.', '\.').replace('|', '\|'),new,line)
print("line after sub1(dattim): "+str(line))
if re.search(oldpatdat,line):
print("oldpatdat found")
klem=re.findall(oldpatdat,line)
print("Klem: "+str(klem))
for ele in klem:
print("ele: "+str(ele))
new="|"+ele[7:11]+"-"+ele[4:6]+"-"+ele[1:3]+"|"
print("new: "+new)
line=re.sub(ele.replace('.', '\.').replace('|', '\|'),new,line)
# or you could use line=line.replace(ele, new)
print("line after sub2(dat): "+str(line))
setdef('12808|08.12.2008|13:44:35|-0.05||||||||0.26|1.53|2.94|0.81|1.75|5.53|79.56||||2|K:\Path\to\File\TE08-08-Chla-12.08.2008.xls|19.01.2009 09:34:57|9|15||')