Python: 对作为字典键的 ip 范围进行排序

Question

我有一个字典，其中有 IP 地址范围作为键（用于在上一步中删除重复数据）和某些对象作为值。这是一个例子

部分字典sresult:

10.102.152.64-10.102.152.95 object1:object3
10.102.158.0-10.102.158.255 object2:object5:object4
10.102.158.0-10.102.158.31  object3:object4
10.102.159.0-10.102.255.255 object6

有几万行，我想在keys中按IP地址排序（正确）

我尝试根据范围分隔符 - 拆分密钥，以获得可以按如下方式排序的单个 IP 地址：

ips={}
for key in sresult:
    if '-' in key:
        l = key.split('-')[0]
        ips[l] = key
    else:
        ips[1] = key

然后使用在另一个 post 上找到的代码，按 IP 地址排序，然后在原始字典中查找值：

sips = sorted(ipaddress.ip_address(line.strip()) for line in ips)
for x in sips:
    print("SRC: "+ips[str(x)], "OBJECT: "+" :".join(list(set(sresult[ips[str(x)]]))), sep=",")

我遇到的问题是，当我拆分原始范围并将排序后的第一个 IP 作为新键添加到另一个字典中时，我再次删除了丢失的数据行 - 示例中的第 2 行和第 3 行

 line 1 10.102.152.64 -10.102.152.95
 line 2 10.102.158.0  -10.102.158.255
 line 3 10.102.158.0  -10.102.158.31
 line 4 10.102.159.0  -10.102.255.25

变成

line 1 10.102.152.64 -10.102.152.95
line 3 10.102.158.0  -10.102.158.31
line 4 10.102.159.0  -10.102.255.25

所以在使用 IP 地址排序键重建原始字典时，我丢失了数据

有人可以帮忙吗？

Answer 1

编辑这个post现在由三部分组成：

1) 您需要一些有关词典的信息才能理解其余内容。 2) 分析您的代码，以及如何在不使用任何其他 Python 功能的情况下修复它。 3) 我认为最好的解决问题的方法，详细点。

1) 词典

Python 词典未排序。如果我有这样的字典：

dictionary = {"one": 1, "two": 2}

然后我遍历 dictionary.items()，我可以先得到 "one": 1，或者我可以先得到 "two": 2。我不知道。

每个 Python 字典隐含地有两个与之关联的列表：一个键列表和一个值列表。你可以让他们列出这个：

print(list(dictionary.keys()))
print(list(dictionary.values()))

这些列表确实有顺序。所以他们可以排序。当然，这样做不会改变原来的字典。

您的代码

您意识到，在您的情况下，您只想根据字典键中的第一个 IP 地址进行排序。因此，您采用的策略大致如下：

1) 建立一个新字典，其中的键只是第一部分。 2) 从字典中获取键列表。 3）对该键列表进行排序。 4) 查询原始字典中的值。

正如您所注意到的，这种方法将在第 1 步失败。因为一旦您使用截断的键创建新字典，您将失去区分某些仅在末尾不同的键的能力。每个字典键都必须是唯一的。

更好的策略是：

1) 构建一个函数，它可以表示你的 "full" ip 地址作为一个 ip_address 对象。

2) 对字典键列表进行排序（原始字典，不要制作新的）。

3) 按顺序查询字典

让我们看看如何更改您的代码以实现步骤 1。

def represent(full_ip):
    if '-' in full_ip:
        # Stylistic note, never use o or l as variable names.
        # They look just like 0 and 1.
        first_part = full_ip.split('-')[0]
        return ipaddress.ip_address(first_part.strip())

现在我们有了表示完整 IP 地址的方法，我们可以根据这个缩短的版本对它们进行排序，而根本不需要实际更改密钥。我们所要做的就是使用 key 参数告诉 Python 的 sorted 方法我们希望如何表示键（注意，这个键参数与字典中的键无关。它们只是碰巧被称为钥匙。):

# Another stylistic note, always use .keys() when looping over dictionary keys. Explicit is better than implicit.

sips = sorted(sresults.keys(), key=represent)

如果这个 ipaddress 库有效，到这里应该没有问题。您可以按原样使用其余代码。

第 3 部分最佳解决方案

每当您处理排序问题时，总是最容易想到一个更简单的问题：给定两个项目，我将如何比较它们？ Python 为我们提供了一种方法来做到这一点。我们要做的是实现两个名为

的数据模型方法

__le__

和

__eq__

让我们试试看：

class IPAddress:
    def __init__(self, ip_address):
        self.ip_address = ip_address # This will be the full IP address

    def __le__(self, other):
        """ Is this object less than or equal to the other one?"""
        # First, let's find the first parts of the ip addresses
        this_first_ip = self.ip_address.split("-")[0]
        other_first_ip = other.ip_address.split("-")[0]
        # Now let's put them into the external library
        this_object = ipaddress.ip_address(this_first_ip)
        other_object = ipaddress.ip_adress(other_first_ip)
        return this_object <= other_object

    def __eq__(self, other):
        """Are the two objects equal?"""
        return self.ip_address == other.ip_adress

很酷，我们有一个 class。现在，只要我使用“<”或“<=”或“==”，数据模型方法就会自动调用。让我们检查它是否正常工作：

test_ip_1 = IPAddress("10.102.152.64-10.102.152.95")
test_ip_2 = IPAddress("10.102.158.0-10.102.158.255")

print(test_ip_1 <= test_ip_2)

现在，这些数据模型方法的美妙之处在于 Pythons "sort" 和 "sorted" 也会使用它们：

dictionary_keys = sresult.keys()
dictionary_key_objects = [IPAddress(key) for key in dictionary_keys]
sorted_dictionary_key_objects = sorted(dictionary_key_objects)
# According to you latest comment, the line below is what you are missing
sorted_dictionary_keys = [object.ip_address for object in sorted_dictionary_key_objects]

现在你可以做：

for key in sorted_dictionary_keys:
    print(key)
    print(sresults[key])

Python 数据模型几乎是 Python 的决定性特征。我建议阅读它。

Python: 对作为字典键的 ip 范围进行排序

Python: Sorting ip ranges which are dictionary keys

python

sorting

dictionary

ip-address