通过特定分隔符进行字符串操作并写入文本文件

String manipulation by specific delimiter and write in text file

我正在编写一个将文件 updates.txt 作为输入的函数。该文件如下所示:

---------------------------------------------------
MRT Header
    Timestamp: 1453939200(2016-01-28 01:00:00)
    Type: 16(BGP4MP)
    Subtype: 4(BGP4MP_MESSAGE_AS4)
    Length: 39
BGP4MP_MESSAGE_AS4
    Peer AS Number: 37989
    Local AS Number: 12654
    Interface Index: 0
    Address Family: 1(IPv4)
    Peer IP Address: 203.123.48.6
    Local IP Address: 193.0.4.28
BGP Message
    Marker: -- ignored --
    Length: 19
    Type: 4(KEEPALIVE)
---------------------------------------------------
MRT Header
    Timestamp: 1453939200(2016-01-28 01:00:00)
    Type: 16(BGP4MP)
    Subtype: 4(BGP4MP_MESSAGE_AS4)
    Length: 118
BGP4MP_MESSAGE_AS4
    Peer AS Number: 1836
    Local AS Number: 12654
    Interface Index: 0
    Address Family: 1(IPv4)
    Peer IP Address: 146.228.1.3
    Local IP Address: 193.0.4.28
BGP Message
    Marker: -- ignored --
    Length: 98
    Type: 2(UPDATE)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 71
    Path Attribute Flags/Type/Length: 0x40/1/1
        ORIGIN: 0(IGP)
    Path Attribute Flags/Type/Length: 0x40/2/42
        AS_PATH
            Path Segment Type: 2(AS_SEQUENCE)
            Path Segment Length: 10
            Path Segment Value: 1836 174 6453 37282 37088 37629 37629 37629 37629 37629
    Path Attribute Flags/Type/Length: 0x40/3/4
        NEXT_HOP: 146.228.1.3
    Path Attribute Flags/Type/Length: 0xc0/8/12
        COMMUNITY: 1836:110 1836:6000 1836:6031
    NLRI: 154.65.7.0/24
---------------------------------------------------

文件是 'blocks' 的序列。每个块都包含在虚线

之间
---------------------------------------------------
# Block (n)
---------------------------------------------------
# Block (n+1)
---------------------------------------------------
# Block (n+2) , etc

我想逐块读取整个文件,return 只包含字段行的文本文件:时间戳、对等 AS 编号、本地 AS 编号、对等 IP 地址, 本地 IP 地址。

生成的 .txt 文件应如下所示:

---------------------------------------------------
MRT Header
    Timestamp: 1453939200(2016-01-28 01:00:00)
BGP4MP_MESSAGE_AS4
    Peer AS Number: 37989
    Local AS Number: 12654
    Peer IP Address: 203.123.48.6
    Local IP Address: 193.0.4.28
---------------------------------------------------
MRT Header
    Timestamp: 1453939200(2016-01-28 01:00:00)
BGP4MP_MESSAGE_AS4
    Peer AS Number: 1836
    Local AS Number: 12654
    Peer IP Address: 203.123.48.6
    Local IP Address: 193.0.4.28
---------------------------------------------------

理想情况下,我想用新的文本文件覆盖updates.txt不浪费space,并将其保存在新目录"Parsed Updates".

我知道它是最小的,因为我坚持使用破折号分隔符,但我的代码如下所示:

import sys
import os

def parser(filename):
    info = open(filename, 'r+')
    info.read()

    #Here comes the string manipulation code
    #info.split( '---------------------------------------------------')

    info.close()
    print 'The file has been parsed successfully !!'

def main():
    parser('updates.txt')


if __name__=='__main__':
    main()

在这种特定情况下,您甚至不需要在解析之前将块分成单独的部分。您可以逐行检查是否与您想要的信息类型匹配。

out_lines = []
regexes = [
    r'^-+$',
    r'^MRT HEADER\s*$',
    r'^\s*Timestamp:.*$',
    r'^BGP4MP_MESSAGE_AS4\s*$',
    r'^\s*Peer AS Number:.*$',
    r'^\s*Local AS Number:.*$',
    r'^\s*Peer IP Address:.*$',
    r'^\s*Local IP Address:.*$',
]
with open('file.txt', 'r') as f:
    for line in f:
        for regex in regexes:
            if re.match(regex, line):
                out_lines.append(line)
                break

with open('file.txt', 'w') as f:
     f.write('\n'.join(out_lines))
>>> with open('results.txt', 'wb') as r:
...     with open('updates.txt', 'rb') as u:
...         for line in u.readlines():
...             if '-'*51 in line:
...                 r.write(line)
...             else:
...                 if any(field in line for field in ['Timestamp', 'Peer AS Number', 'Local AS Number', 'Peer IP Address', 'Local IP Address','MRTHeader']):
...                     r.write(line)

您的结果文件将如下所示:

$ cat results.txt
---------------------------------------------------
MRT Header
    Timestamp: 1453939200(2016-01-28 01:00:00)
    Peer AS Number: 37989
    Local AS Number: 12654
    Peer IP Address: 203.123.48.6
    Local IP Address: 193.0.4.28
---------------------------------------------------
MRT Header
    Timestamp: 1453939200(2016-01-28 01:00:00)
    Peer AS Number: 1836
    Local AS Number: 12654
    Peer IP Address: 146.228.1.3
    Local IP Address: 193.0.4.28
---------------------------------------------------