Python。在 1 行加入特定行
Python. Join specific lines on 1 line
假设我有这个文件:
1
17:02,111
Problem report related to
router
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
now due to compromised data
我想要这个输出:
1
17:02,111
Problem report related to router
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk now due to compromised data
一直在 bash 尝试并找到了一种接近的解决方案,但我不知道如何在 Python 上执行此操作。
提前致谢
如果你想删除 extea 线:
为了这个目的,如果该行后面没有空新行,或者该行之前应该有一个与以下正则表达式匹配的行 ^\d{2}:\d{2},\d{3}\s$
.
因此,为了在每次迭代中访问下一行,您可以使用 itertools.tee
从名称为 temp
的主文件对象创建一个文件对象,并在其上应用 next
函数.并使用 re.match
匹配正则表达式。
from itertools import tee
import re
with open('ex.txt') as f,open('new.txt','w') as out:
temp,f=tee(f)
next(temp)
try:
for line in f:
if next(temp) !='\n' or re.match(r'^\d{2}:\d{2},\d{3}\s$',pre):
out.write(line)
pre=line
except :
pass
结果:
1
17:02,111
Problem report related to
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
如果您想将其余部分连接到第三行:
如果您想将第三行之后的其余行连接到第三行,您可以使用以下正则表达式来查找后跟 \n\n
或文件末尾 ($
) 的所有块:
r"(.*?)(?=\n\n|$)"
然后根据日期格式的行拆分块并将各部分写入输出文件,但请注意,您需要将第 3 部分中的新行替换为 space :
ex.txt:
1
17:02,111
Problem report related to
router
another line
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
now due to compromised data
line 5
line 6
line 7
演示:
def splitter(s):
for x in re.finditer(r"(.*?)(?=\n\n|$)", s,re.DOTALL):
g=x.group(0)
if g:
yield g
import re
with open('ex.txt') as f,open('new.txt','w') as out:
for block in splitter(f.read()):
first,second,third= re.split(r'(\d{2}:\d{2},\d{3}\n)',block)
out.write(first+second+third.replace('\n',' '))
结果:
1
17:02,111
Problem report related to router another line
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk now due to compromised data line 5 line 6 line 7
注 :
在这个答案中,splitter
函数 returns 一个生成器,当您处理大文件并拒绝在内存中存储不可用的行时非常高效。
x="""1
17:02,111
Problem report related to
router
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
now due to compromised data
or something"""
def repl(matchobj):
ll=matchobj.group().split("\n")
return "\n".join(ll[:3])+" "+" ".join(ll[3:])
print re.sub(r"\b\d+\n\d+:\d+,\d+\b[\s\S]*?(?=\n{2}|$)",repl,x)
您可以将 re.sub
与您自己的自定义替换功能一起使用。
当且仅当文件符合您给定的示例时,这才有效
注:
There may be a faster way if regex is used and it might also be simpler
但想以合乎逻辑的方式进行
代码:
inp=open("output.txt","r")
inp=inp.read().split("\n")
print inp
tempString=""
output=[]
w=0
for s in inp:
if s:
if any(c.isalpha() for c in s):
tempString=tempString+" "+s
else:
w=0
if tempString:
output.append(tempString.strip())
tempString=""
output.append(s)
else:
if tempString:
output.append(tempString.strip())
tempString=""
output.append(" ")
if tempString:
output.append(tempString.strip())
print "\n".join(output)
out=open("newoutput.txt","w")
out.write("\n".join(output))
out.close()
输入:
1
17:02,111
Problem report related to
2 router
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
now due to compromised data
4
17:02,111
Problem report related to
router
输出:
1
17:02,111
Problem report related to 2 router
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk now due to compromised data
4
17:02,111
Problem report related to router
假设我有这个文件:
1
17:02,111
Problem report related to
router
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
now due to compromised data
我想要这个输出:
1
17:02,111
Problem report related to router
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk now due to compromised data
一直在 bash 尝试并找到了一种接近的解决方案,但我不知道如何在 Python 上执行此操作。
提前致谢
如果你想删除 extea 线:
为了这个目的,如果该行后面没有空新行,或者该行之前应该有一个与以下正则表达式匹配的行 ^\d{2}:\d{2},\d{3}\s$
.
因此,为了在每次迭代中访问下一行,您可以使用 itertools.tee
从名称为 temp
的主文件对象创建一个文件对象,并在其上应用 next
函数.并使用 re.match
匹配正则表达式。
from itertools import tee
import re
with open('ex.txt') as f,open('new.txt','w') as out:
temp,f=tee(f)
next(temp)
try:
for line in f:
if next(temp) !='\n' or re.match(r'^\d{2}:\d{2},\d{3}\s$',pre):
out.write(line)
pre=line
except :
pass
结果:
1
17:02,111
Problem report related to
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
如果您想将其余部分连接到第三行:
如果您想将第三行之后的其余行连接到第三行,您可以使用以下正则表达式来查找后跟 \n\n
或文件末尾 ($
) 的所有块:
r"(.*?)(?=\n\n|$)"
然后根据日期格式的行拆分块并将各部分写入输出文件,但请注意,您需要将第 3 部分中的新行替换为 space :
ex.txt:
1
17:02,111
Problem report related to
router
another line
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
now due to compromised data
line 5
line 6
line 7
演示:
def splitter(s):
for x in re.finditer(r"(.*?)(?=\n\n|$)", s,re.DOTALL):
g=x.group(0)
if g:
yield g
import re
with open('ex.txt') as f,open('new.txt','w') as out:
for block in splitter(f.read()):
first,second,third= re.split(r'(\d{2}:\d{2},\d{3}\n)',block)
out.write(first+second+third.replace('\n',' '))
结果:
1
17:02,111
Problem report related to router another line
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk now due to compromised data line 5 line 6 line 7
注 :
在这个答案中,splitter
函数 returns 一个生成器,当您处理大文件并拒绝在内存中存储不可用的行时非常高效。
x="""1
17:02,111
Problem report related to
router
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
now due to compromised data
or something"""
def repl(matchobj):
ll=matchobj.group().split("\n")
return "\n".join(ll[:3])+" "+" ".join(ll[3:])
print re.sub(r"\b\d+\n\d+:\d+,\d+\b[\s\S]*?(?=\n{2}|$)",repl,x)
您可以将 re.sub
与您自己的自定义替换功能一起使用。
当且仅当文件符合您给定的示例时,这才有效
注:
There may be a faster way if regex is used and it might also be simpler
但想以合乎逻辑的方式进行
代码:
inp=open("output.txt","r")
inp=inp.read().split("\n")
print inp
tempString=""
output=[]
w=0
for s in inp:
if s:
if any(c.isalpha() for c in s):
tempString=tempString+" "+s
else:
w=0
if tempString:
output.append(tempString.strip())
tempString=""
output.append(s)
else:
if tempString:
output.append(tempString.strip())
tempString=""
output.append(" ")
if tempString:
output.append(tempString.strip())
print "\n".join(output)
out=open("newoutput.txt","w")
out.write("\n".join(output))
out.close()
输入:
1
17:02,111
Problem report related to
2 router
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
now due to compromised data
4
17:02,111
Problem report related to
router
输出:
1
17:02,111
Problem report related to 2 router
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk now due to compromised data
4
17:02,111
Problem report related to router