给定两个文件（IP 和子网信息），创建将每个 IP 关联到子网的文件

Question

几天来我一直在努力寻找解决此解决方案的正确方法，我正在寻求一些帮助。

我有两个文件，需要创建第三个文件来显示它们之间的关系。

IP 地址文件 - ip.csv
子网文件 - subnet.csv

我需要指定每个 IP 所在的子网，并创建第三个文件

ip.csv 文件将包含大约 150 万个 IP，subnet.csv 文件将包含大约 140,000 个子网。

ip.csv 文件示例：

IP,Type
10.78.175.167,IPv4
10.20.3.56,IPv4

subnet.csv 文件示例：

Subnet,Netmask
10.176.122.136/30,255.255.255.252
10.20.3.0/24,255.255.254.0

我需要创建的文件格式：

Subnet,IP
10.20.3.0/24,10.20.3.56

我尝试使用这些页面中的内容：

这是我试过的代码。它适用于小集，但我在处理完整文件集时遇到了问题运行。

#!/usr/local/bin/python2.7
import csv
import ipaddress
import iptools
import re
import SubnetTree
import sys
from socket import inet_aton

testdir = '/home/test/testdir/'
iprelfile = testdir + 'relationship.csv'
testipsub = testdir + 'subnet.csv'
testipaddr = testdir + 'ip.csv'

o1 = open (iprelfile, "a")

# Subnet file
IPR = set()
o1.write('Subnet,IP\n')
with open(testipsub, 'rb') as master:
    reader = csv.reader(master)
    for row in reader:
        if 'Subnet' not in row[0]:
            # Convert string to unicode to be parsed with ipaddress module
            b = unicode(row[1])
            # Using ipaddress module to create list containing every IP in subnet
            n2 = ipaddress.ip_network(b)
            b1 = (list(n2.hosts()))
            # IP address file
            with open(testipaddr, 'rb') as ipaddy:
                readera = csv.reader(ipaddy)
                for rowa in readera:
                    if 'IP' not in rowa[0]:
                        bb = rowa[0]
                        for ij in b1:
                            # Convert to string for comparison
                            f = str(ij)
                            # If the IP address is in subnet range
                            if f == bb:
                                IPR.update([row[0] + ',' + bb + '\n'])


for ip in IPR:
    o1.write(ip + '\n')

# Closing the file
o1.close()

Answer 1

您可以将所有子网读取到内存中并按网络地址对它们进行排序。这将允许您使用 bisect to do a binary search in order to find the subnet for every IP. This only works if the subnets don't overlap each other, if they do you'll probably need to use segment tree.

import bisect
import csv
import ipaddress

def sanitize(ip):
    parts = ip.split('/', 1)
    parts[0] = '.'.join(str(int(x)) for x in parts[0].split('.'))

    return '/'.join(parts)

with open('subnet.csv') as subnet_f:
    reader = csv.reader(subnet_f)
    next(reader)    # Skip column names

    # Create list of subnets sorted by network address and
    # list of network addresses in the same order
    subnets = sorted((ipaddress.IPv4Network(sanitize(row[0])) for row in reader),
                     key=lambda x: x.network_address)
    network_addrs = [subnet.network_address for subnet in subnets]

with open('ip.csv') as ip_f, open('output.csv', 'w', newline='') as out_f:
    reader = csv.reader(ip_f)
    next(reader)

    writer = csv.writer(out_f)
    writer.writerow(['Subnet', 'IP'])

    for row in reader:
        ip = ipaddress.IPv4Address(sanitize(row[0]))
        index = bisect.bisect(network_addrs, ip) - 1

        if index < 0 or subnets[index].broadcast_address < ip:
            continue    # IP not in range of any networks
        writer.writerow([subnets[index], ip])

输出：

Subnet,IP
10.20.3.0/24,10.20.3.56

上面的时间复杂度为 O(n log m)，其中 n 是 IP 数，m 是网络数。请注意，它仅在 Python 3 下运行，因为 ipaddress is not included to Python 2.7. If you need to use Python 2.7 there are backports 可用。

更新高效解决方案的首要目标是找到一种有效处理每个单独 IP 的方法。遍历所有子网非常昂贵，所以它不会这样做。最好在每个子网中创建第一个 IP 的排序列表。对于给定的数据，它看起来像这样：

[IPv4Address('10.20.3.0'), IPv4Address('10.176.122.136')]

这将允许我们执行二进制搜索以找到等于或低于单个 IP 的 IP 地址索引。例如，当我们搜索 IP 10.20.3.56 时，我们使用 bisect.bisect 为我们提供第一个大于 IP 的索引并将其递减一个：

>>> l = [IPv4Address('10.20.3.0'), IPv4Address('10.176.122.136')]
>>> index = bisect.bisect(l, IPv4Address('10.20.3.56'))
>>> index
1
>>> l[index - 1]
IPv4Address('10.20.3.0')

由于我们已将网络存储到另一个顺序相同的列表中，因此我们可以使用索引来检索给定的子网。一旦我们有了子网，我们仍然需要检查单个 IP 是否等于或低于子网中的最后一个 IP。如果单个 IP 在子网内，则写入一行结果，如果不在，则移至下一个 IP。

给定两个文件（IP 和子网信息），创建将每个 IP 关联到子网的文件

Given two files (IP's and Subnet Info), create file that associates each IP to a subnet

python

csv

ip-address

netmask

subnet