通过特定分隔符进行字符串操作并写入文本文件
String manipulation by specific delimiter and write in text file
我正在编写一个将文件 updates.txt 作为输入的函数。该文件如下所示:
---------------------------------------------------
MRT Header
Timestamp: 1453939200(2016-01-28 01:00:00)
Type: 16(BGP4MP)
Subtype: 4(BGP4MP_MESSAGE_AS4)
Length: 39
BGP4MP_MESSAGE_AS4
Peer AS Number: 37989
Local AS Number: 12654
Interface Index: 0
Address Family: 1(IPv4)
Peer IP Address: 203.123.48.6
Local IP Address: 193.0.4.28
BGP Message
Marker: -- ignored --
Length: 19
Type: 4(KEEPALIVE)
---------------------------------------------------
MRT Header
Timestamp: 1453939200(2016-01-28 01:00:00)
Type: 16(BGP4MP)
Subtype: 4(BGP4MP_MESSAGE_AS4)
Length: 118
BGP4MP_MESSAGE_AS4
Peer AS Number: 1836
Local AS Number: 12654
Interface Index: 0
Address Family: 1(IPv4)
Peer IP Address: 146.228.1.3
Local IP Address: 193.0.4.28
BGP Message
Marker: -- ignored --
Length: 98
Type: 2(UPDATE)
Withdrawn Routes Length: 0
Total Path Attribute Length: 71
Path Attribute Flags/Type/Length: 0x40/1/1
ORIGIN: 0(IGP)
Path Attribute Flags/Type/Length: 0x40/2/42
AS_PATH
Path Segment Type: 2(AS_SEQUENCE)
Path Segment Length: 10
Path Segment Value: 1836 174 6453 37282 37088 37629 37629 37629 37629 37629
Path Attribute Flags/Type/Length: 0x40/3/4
NEXT_HOP: 146.228.1.3
Path Attribute Flags/Type/Length: 0xc0/8/12
COMMUNITY: 1836:110 1836:6000 1836:6031
NLRI: 154.65.7.0/24
---------------------------------------------------
文件是 'blocks' 的序列。每个块都包含在虚线
之间
---------------------------------------------------
# Block (n)
---------------------------------------------------
# Block (n+1)
---------------------------------------------------
# Block (n+2) , etc
我想逐块读取整个文件,return 只包含字段行的文本文件:时间戳、对等 AS 编号、本地 AS 编号、对等 IP 地址, 本地 IP 地址。
生成的 .txt 文件应如下所示:
---------------------------------------------------
MRT Header
Timestamp: 1453939200(2016-01-28 01:00:00)
BGP4MP_MESSAGE_AS4
Peer AS Number: 37989
Local AS Number: 12654
Peer IP Address: 203.123.48.6
Local IP Address: 193.0.4.28
---------------------------------------------------
MRT Header
Timestamp: 1453939200(2016-01-28 01:00:00)
BGP4MP_MESSAGE_AS4
Peer AS Number: 1836
Local AS Number: 12654
Peer IP Address: 203.123.48.6
Local IP Address: 193.0.4.28
---------------------------------------------------
理想情况下,我想用新的文本文件覆盖updates.txt不浪费space,并将其保存在新目录"Parsed Updates".
我知道它是最小的,因为我坚持使用破折号分隔符,但我的代码如下所示:
import sys
import os
def parser(filename):
info = open(filename, 'r+')
info.read()
#Here comes the string manipulation code
#info.split( '---------------------------------------------------')
info.close()
print 'The file has been parsed successfully !!'
def main():
parser('updates.txt')
if __name__=='__main__':
main()
在这种特定情况下,您甚至不需要在解析之前将块分成单独的部分。您可以逐行检查是否与您想要的信息类型匹配。
out_lines = []
regexes = [
r'^-+$',
r'^MRT HEADER\s*$',
r'^\s*Timestamp:.*$',
r'^BGP4MP_MESSAGE_AS4\s*$',
r'^\s*Peer AS Number:.*$',
r'^\s*Local AS Number:.*$',
r'^\s*Peer IP Address:.*$',
r'^\s*Local IP Address:.*$',
]
with open('file.txt', 'r') as f:
for line in f:
for regex in regexes:
if re.match(regex, line):
out_lines.append(line)
break
with open('file.txt', 'w') as f:
f.write('\n'.join(out_lines))
>>> with open('results.txt', 'wb') as r:
... with open('updates.txt', 'rb') as u:
... for line in u.readlines():
... if '-'*51 in line:
... r.write(line)
... else:
... if any(field in line for field in ['Timestamp', 'Peer AS Number', 'Local AS Number', 'Peer IP Address', 'Local IP Address','MRTHeader']):
... r.write(line)
您的结果文件将如下所示:
$ cat results.txt
---------------------------------------------------
MRT Header
Timestamp: 1453939200(2016-01-28 01:00:00)
Peer AS Number: 37989
Local AS Number: 12654
Peer IP Address: 203.123.48.6
Local IP Address: 193.0.4.28
---------------------------------------------------
MRT Header
Timestamp: 1453939200(2016-01-28 01:00:00)
Peer AS Number: 1836
Local AS Number: 12654
Peer IP Address: 146.228.1.3
Local IP Address: 193.0.4.28
---------------------------------------------------
我正在编写一个将文件 updates.txt 作为输入的函数。该文件如下所示:
---------------------------------------------------
MRT Header
Timestamp: 1453939200(2016-01-28 01:00:00)
Type: 16(BGP4MP)
Subtype: 4(BGP4MP_MESSAGE_AS4)
Length: 39
BGP4MP_MESSAGE_AS4
Peer AS Number: 37989
Local AS Number: 12654
Interface Index: 0
Address Family: 1(IPv4)
Peer IP Address: 203.123.48.6
Local IP Address: 193.0.4.28
BGP Message
Marker: -- ignored --
Length: 19
Type: 4(KEEPALIVE)
---------------------------------------------------
MRT Header
Timestamp: 1453939200(2016-01-28 01:00:00)
Type: 16(BGP4MP)
Subtype: 4(BGP4MP_MESSAGE_AS4)
Length: 118
BGP4MP_MESSAGE_AS4
Peer AS Number: 1836
Local AS Number: 12654
Interface Index: 0
Address Family: 1(IPv4)
Peer IP Address: 146.228.1.3
Local IP Address: 193.0.4.28
BGP Message
Marker: -- ignored --
Length: 98
Type: 2(UPDATE)
Withdrawn Routes Length: 0
Total Path Attribute Length: 71
Path Attribute Flags/Type/Length: 0x40/1/1
ORIGIN: 0(IGP)
Path Attribute Flags/Type/Length: 0x40/2/42
AS_PATH
Path Segment Type: 2(AS_SEQUENCE)
Path Segment Length: 10
Path Segment Value: 1836 174 6453 37282 37088 37629 37629 37629 37629 37629
Path Attribute Flags/Type/Length: 0x40/3/4
NEXT_HOP: 146.228.1.3
Path Attribute Flags/Type/Length: 0xc0/8/12
COMMUNITY: 1836:110 1836:6000 1836:6031
NLRI: 154.65.7.0/24
---------------------------------------------------
文件是 'blocks' 的序列。每个块都包含在虚线
之间---------------------------------------------------
# Block (n)
---------------------------------------------------
# Block (n+1)
---------------------------------------------------
# Block (n+2) , etc
我想逐块读取整个文件,return 只包含字段行的文本文件:时间戳、对等 AS 编号、本地 AS 编号、对等 IP 地址, 本地 IP 地址。
生成的 .txt 文件应如下所示:
---------------------------------------------------
MRT Header
Timestamp: 1453939200(2016-01-28 01:00:00)
BGP4MP_MESSAGE_AS4
Peer AS Number: 37989
Local AS Number: 12654
Peer IP Address: 203.123.48.6
Local IP Address: 193.0.4.28
---------------------------------------------------
MRT Header
Timestamp: 1453939200(2016-01-28 01:00:00)
BGP4MP_MESSAGE_AS4
Peer AS Number: 1836
Local AS Number: 12654
Peer IP Address: 203.123.48.6
Local IP Address: 193.0.4.28
---------------------------------------------------
理想情况下,我想用新的文本文件覆盖updates.txt不浪费space,并将其保存在新目录"Parsed Updates".
我知道它是最小的,因为我坚持使用破折号分隔符,但我的代码如下所示:
import sys
import os
def parser(filename):
info = open(filename, 'r+')
info.read()
#Here comes the string manipulation code
#info.split( '---------------------------------------------------')
info.close()
print 'The file has been parsed successfully !!'
def main():
parser('updates.txt')
if __name__=='__main__':
main()
在这种特定情况下,您甚至不需要在解析之前将块分成单独的部分。您可以逐行检查是否与您想要的信息类型匹配。
out_lines = []
regexes = [
r'^-+$',
r'^MRT HEADER\s*$',
r'^\s*Timestamp:.*$',
r'^BGP4MP_MESSAGE_AS4\s*$',
r'^\s*Peer AS Number:.*$',
r'^\s*Local AS Number:.*$',
r'^\s*Peer IP Address:.*$',
r'^\s*Local IP Address:.*$',
]
with open('file.txt', 'r') as f:
for line in f:
for regex in regexes:
if re.match(regex, line):
out_lines.append(line)
break
with open('file.txt', 'w') as f:
f.write('\n'.join(out_lines))
>>> with open('results.txt', 'wb') as r:
... with open('updates.txt', 'rb') as u:
... for line in u.readlines():
... if '-'*51 in line:
... r.write(line)
... else:
... if any(field in line for field in ['Timestamp', 'Peer AS Number', 'Local AS Number', 'Peer IP Address', 'Local IP Address','MRTHeader']):
... r.write(line)
您的结果文件将如下所示:
$ cat results.txt
---------------------------------------------------
MRT Header
Timestamp: 1453939200(2016-01-28 01:00:00)
Peer AS Number: 37989
Local AS Number: 12654
Peer IP Address: 203.123.48.6
Local IP Address: 193.0.4.28
---------------------------------------------------
MRT Header
Timestamp: 1453939200(2016-01-28 01:00:00)
Peer AS Number: 1836
Local AS Number: 12654
Peer IP Address: 146.228.1.3
Local IP Address: 193.0.4.28
---------------------------------------------------