解析以特定模式组织的文件

Parse file organised in a certain pattern

f是一个文件,如下所示:

+++++192.168.1.1+++++
Port Number: 80
......
product: Apache httpd
IP Address: 192.168.1.1

+++++192.168.1.2+++++
Port Number: 80
......
product: Apache http
IP Address: 192.168.1.2

+++++192.168.1.3+++++
Port Number: 80
......
product: Apache httpd
IP Address: 192.168.1.3

+++++192.168.1.4+++++
Port Number: 3306
......
product: MySQL
IP Address: 192.168.1.4

+++++192.168.1.5+++++
Port Number: 22
......
product: Open SSH
IP Address: 192.168.1.5

+++++192.168.1.6+++++
Port Number: 80
......
product: Apache httpd
IP Address: 192.168.1.6

预期输出为:

These hosts have Apache services:

192.168.1.1
192.168.1.2
192.168.1.3
192.168.1.6

我试过的代码:

for service in f:
    if "product: Apache httpd" in service:
        for host in f:
            if "IP Address: " in host:
                print(host[5:], service)

它只是给了我所有的 IP 地址,而不是安装了 Apache 的特定主机。

我怎样才能得到预期的输出?

也许是这样的。 为了便于说明,我已经内联了数据,但它也可以来自文件。

此外,我们会先收集每个主机的所有数据,以防您还需要一些其他信息,然后打印出所需的信息。这意味着 info_by_ip 看起来大致像

{'192.168.1.1': {'Port Number': '80', 'product': 'Apache httpd'},
 '192.168.1.2': {'Port Number': '80', 'product': 'Apache http'},
 '192.168.1.3': {'Port Number': '80', 'product': 'Apache httpd'},
 '192.168.1.4': {'Port Number': '3306', 'product': 'MySQL'},
 '192.168.1.5': {'Port Number': '22', 'product': 'Open SSH'},
 '192.168.1.6': {'Port Number': '80', 'product': 'Apache httpd'}}

.

代码:

import collections

data = """
+++++192.168.1.1+++++
Port Number: 80
......
product: Apache httpd

+++++192.168.1.2+++++
Port Number: 80
......
product: Apache http

+++++192.168.1.3+++++
Port Number: 80
......
product: Apache httpd

+++++192.168.1.4+++++
Port Number: 3306
......
product: MySQL

+++++192.168.1.5+++++
Port Number: 22
......
product: Open SSH

+++++192.168.1.6+++++
Port Number: 80
......
product: Apache httpd
"""

ip = None  # Current IP address

# A defaultdict lets us conveniently add per-IP data without having to
# create the inner dicts explicitly:
info_by_ip = collections.defaultdict(dict)

for line in data.splitlines():  # replace with `for line in file:` for file purposes
    if line.startswith('+++++'):  # Seems like an IP address separator
        ip = line.strip('+')  # Remove + signs from both ends
        continue  # Skip to next line
    if ':' in line:  # If the line contains a colon,
        key, value = line.split(':', 1)  # ... split by it, 
        info_by_ip[ip][key.strip()] = value.strip()  # ... and add to this IP's dict.


for ip, info in info_by_ip.items():
    if info.get('product') == 'Apache httpd':
        print(ip)

您可以使用 +++++ 作为分隔符并使用以下代码获取所需的 ip。

    with open('ip.txt', 'r') as fileReadObj:
    rows = fileReadObj.read()
    text_lines = rows.split('+++++')
    for i, row in enumerate(text_lines):
        if 'Apache' in str(row):
            print(text_lines[i - 1])

你也可以试试这个:

apaches = []
with open('ips.txt') as f:
    sections = f.read().split('\n\n')

    for section in sections:
        _, _, _, product, ip = section.split('\n')
        _, product_type = product.split(':')
        _, address = ip.split(':')

        if product_type.strip().startswith('Apache'):
            apaches.append(address.strip())

print('These hosts have Apache services:\n%s' % '\n'.join(apaches))

哪些输出:

These hosts have Apache services:
192.168.1.1
192.168.1.2
192.168.1.3
192.168.1.6

解释:

with open(filename,'r') as fobj: # Open the file as read only
    search_string = fobj.read() # Read file into string
    print('These hosts have Apache services:\n\n')
    # Split string by search term
    for string_piece in search_string.split('Apache'):  
        # Split string to isolate IP and count up/back 2
        ip_addr = string_piece.split('+++++')[-2] 
        print(ip_addr)

压缩:

with open(filename,'r') as fobj:
    print('These hosts have Apache services:\n\n')
    for string_piece in fobj.read().split('Apache'):
        print('{}\n'.format(string_piece.split('+++++')[-2]))