使用嵌套列表和自定义键值对将字符串转换为嵌套字典

Question

我正在从网络交换机中提取数据，它以这样的字符串形式输出。

Gi1/0/1   COMPUTER1  Full   1000    Auto Down   off   A  (1),5-7777
Gi1/0/2   COMPUTER2  Full   1000    Auto Down   On    T  (1),5-7777
Gi1/0/3   COMPUTER3  Full   1000    Auto Up     Off   A  (1),5-7777
Gi1/0/4   COMPUTER4  Full   1000    Auto Down   Off   A  (1),5-7777
Gi1/0/5   COMPUTER5  Full   1000    Auto Down   Off   A  1
Gi1/0/6   COMPUTER6  Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/7   COMPUTER7  N/A    Unknown Auto Down   Off   A  1
Gi1/0/8   COMPUTER8  Full   1000    Auto Up     Off   A  1
Gi1/0/9   COMPUTER9  Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/10  COMPUTER10 Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/11  COMPUTER11 Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/12  COMPUTER12 Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/13  COMPUTER13 Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/14  COMPUTER14 Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/15  Server1    N/A    Unknown Auto Down   Off   A  55
Gi1/0/16  Server2    N/A    Unknown Auto Down   Off
Gi1/0/17  Server3    N/A    Unknown Auto Down   Off
Gi1/0/18  Server4    N/A    Unknown Auto Down   Off
Gi1/0/19  Server5    Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/20  Server6    Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/21  Server7    Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/22  Server8    Full   1000    Auto Up     On    A  3311
Gi1/0/23  COMPUTER15 Full   1000    Auto Up     Off   A  25
Gi1/0/24  COMPUTER16 Full   1000    Auto Up     On    A  99
Gi1/0/25  COMPUTER17 Full   1000    Auto Up     On    A  99
Gi1/0/26  Server9    Full   10      Auto Up     On    A  99
Gi1/0/27  COMPUTER18 Full   10      Auto Up     On    A  99
Gi1/0/28             N/A    Unknown Auto Down   Off   A  1
Gi1/0/29             N/A    Unknown Auto Down   Off   A  1
Gi1/0/30             N/A    Unknown Auto Down   Off   A  1
Gi1/0/31             N/A    Unknown Auto Down   Off   A  1
Gi1/0/32             N/A    Unknown Auto Down   Off   A  1
Gi1/0/33             N/A    Unknown Auto Down   Off   A  1
Gi1/0/34             N/A    Unknown Auto Down   Off   A  1
Gi1/0/35             N/A    Unknown Auto Down   Off   A  1
Gi1/0/36             N/A    Unknown Auto Down   Off   A  1
Gi1/0/37             N/A    Unknown Auto Down   Off   A  1
Gi1/0/38             N/A    Unknown Auto Down   Off   A  1
Gi1/0/39             N/A    Unknown Auto Down   Off   A  1
Gi1/0/40             N/A    Unknown Auto Down   Off   A  1
Gi1/0/41             N/A    Unknown Auto Down   Off   A  1
Gi1/0/42             N/A    Unknown Auto Down   Off   A  1
Gi1/0/43             N/A    Unknown Auto Down   Off   A  1
Gi1/0/44             N/A    Unknown Auto Down   Off   A  1
Gi1/0/45             N/A    Unknown Auto Down   Off   A  1
Gi1/0/46             N/A    Unknown Auto Down   Off   A  1
Gi1/0/47             N/A    Unknown Auto D-Down Off   A  1
Gi1/0/48             N/A    Unknown Auto Down   Off   A  1

我正在将 TextFSM 与 Netmiko 结合使用，但我想知道如何在不使用 TextFSM 的情况下格式化数据。

我想将数据转换成我可以像这样解析的地方：

print(port[14]['Description'])

我会得到 COMPUTER14

我认为结构应该是这样的：

{port: {14: {
               'Interface': 'Gi1/0/14',
               'Description': 'COMPUTER14',
               'Duplex': 'Full',
               'Speed': '1000',
               'Neg': 'Auto',
               'Linkstate': 'Up',
               'Flowctrl': 'On',
               'M': 'T'
               'VLAN': ['(1)', '5-7777']
               },
          15: {
               'Interface': 'Gi1/0/15',
               'Description': 'SERVER1',
               'Duplex': 'N/A',
               'Speed': 'Unknown',
               'Neg': 'Auto',
               'Linkstate': 'Down',
               'Flowctrl': 'off',
               'M': 'A',
               'VLAN': [55]
               }
         }
}
# VLAN would be a list and anything that doesn't have data would return 'None'

但不确定如何使用 Python 来解决这个问题。我能做的最多就是使用 splitlines().

转换为列表

编辑：在我尝试做之前：

data_list = output.splitlines()

 for data in data_list:
      print(data.split(' '))

但是从那里出来的列表是这样的：

['Gi1/0/1', '', '', 'COMPUTER1', 'Full', '', '', '1000', '', '', '', 'Auto', 'Down', '', '', '', '', 'off', '', '', '', 'A', '', '(1),5-7777']

从这里我看到我需要将列表变成字典，但我不知道如何计算空格以及没有数据的空格，我仍然想证明有none.

那个数据出来是这样的：

['Gi1/0/28', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'N/A', '', '', '', 'Unknown', 'Auto', 'Down', '', '', 'Off', '', '', 'A', '', '1']

前进我累了：

keys = ["Interface", "Description", "Duplex", "Speed", "Neg", "Linkstate", "Flowctrl", "M", "VLAN"]

for data in data_list:
    z = zip(keys, data.split(' '))
    dictionary = dict(z)
    print(dictionary)

尽管这构成了一个字典，但由于空格，键与值无法正确匹配。

{'Interface': 'Gi1/0/1', 'Description': '', 'Duplex': '', 'Speed': 'COMPUTER1', 'Neg': 'Full', 'Linkstate': '', 'Flowctrl': '', 'M': '1000', 'VLAN': ''}

{'Interface': 'Gi1/0/28', 'Description': '', 'Duplex': '', 'Speed': '', 'Neg': '', 'Linkstate': '', 'Flowctrl': '', 'M': '', 'VLAN': ''}

如何计算空格，或者我走错方向了？

Answer 1

从聊天中的对话来看，文本解析似乎是唯一的出路。我复制了整个文本并将其保存到一个文件中，因为我假设您已将命令的输出存储在一个文件中。不过，为了回答的长度，我只尝试使用运行几个端口而不是全部 48 个端口。另外 请注意，这仅在每一列至少有一行数据时才有效。如果有一列没有端口有任何数据

我没有使用 readline，而是使用了 read()，这样我就可以在 \n 处拆分它。当使用 readlines()

时，这基本上删除了行中每个条目末尾的 \n

with open('port_data.txt', 'r') as file:
    contents = file.read()
lines = contents.split('\n')
noOfLine = len(lines)

我们还维护了一个列描述列表，我相信将来不会为相关命令更改。

columns = ['Interfaces', 'Description', 'Duplex', 'Speed', 'Neg', 'LinkState', 'Flow Control', 'M', 'VLAN']
colSize = len(columns)

我所做的是为每列内容保留一个列表，以便轻松迭代以获取所有数据。

# This is a list of lists. 
# finalList[0] is a list of all ports
# finalList[1] is a list of all descriptions, etc.
# finalList will have 9 lists because we have 9 columns here.
finalList = [] 

# This is the output dictionary
output = {}

为了构建字典，我们将 lines[] 中的每一行拆分为 ' '。这将确保 split() 生成的列表的第一个索引具有您需要的数据。获得该数据后，我们需要找出下一列的起始位置。

我们知道下一列从该列中最长条目之后的 sspaces 处开始。比如row1的名字是apple，row2的名字是photosynthesis，那么我们就知道row1的第一个词后面多了space来容纳row2同一列的长词。为了找到这个大小，我们跟踪变量中列的 最大长度 。我们对每一列都这样做。

for i in range(len(columns)): # Perform the loop for every column
    resultList = [] # This is used to keep the list of every entry in the current column
    maxSize = 0 # This is the max length of an entry in the current column. We compute this at run time.
    for line in lines:
        line = line.rstrip() # Remove any trailing spaces in the right end.
        curString = line.split(' ')[0] # Split and get the first word
        maxSize = max(maxSize, len(curString)) # Update the maxSize
        resultList.append(curString) # Add this word to the current column list.
    finalList.append(resultList) # Add the current column to the list of columns we already created.

现在我们有了当前列中最大单词的长度，我们可以确定对于每一行，下一列出现在 maxSize 之后的某个点以适应间距。对于下一列非空的行，下一个单词可以在 space 秒后开始。但是，对于下一列为空条目的行，会有更多的 spaces.

为了适应这一点，我们找到在当前 maxSize 之后 space 数量最少的行，并且 trim 行列表中的所有字符串从该位置开始。在同一个主循环中

newLineList = [] # This is going to be the list of lines after we have removed the first entry
minLength = 10000 # Arbitrarily large number
if i != colSize-1: # We don't have to do this for the last column because we won't be processing it anymore.
    for line in lines: # Each line
        minLength = min(minLength, len(line[maxSize:])-len(line[maxSize:].lstrip())) # len('    Apple') - len('Apple') = 4 meaning there are 4 spaces from the current position of line[maxSize]
    # Now that we have the minimum length, we can say for sure that the next column starts at maxSize+minLength for every row.
    for line in lines:
        newLineList.append(line[maxSize+minLength:]) # temporary holder for recomputed lines.
    lines = [] # Set it to empty
    for line in newLineList:
        lines.append(line) # Append all the new lines to the original lines set for next iteration.

现在我们应该有一个这样的列表：

>>> for x in finalList:
>>> ... print(x)
['Gi1/0/1', 'Gi1/0/2', 'Gi1/0/3', 'Gi1/0/4', 'Gi1/0/47', 'Gi1/0/48']
['COMPUTER1', 'COMPUTER2', 'COMPUTER3', 'COMPUTER4', '', '']
['Full', 'Full', 'Full', 'Full', 'N/A', 'N/A']
['1000', '1000', '1000', '1000', 'Unknown', 'Unknown']
['Auto', 'Auto', 'Auto', 'Auto', 'Auto', 'Auto']
['Down', 'Down', 'Up', 'Down', 'D-Down', 'Down']
['off', 'On', 'Off', 'Off', 'Off', 'Off']
['A', 'T', 'A', 'A', 'A', 'A']
['(1),5-7777', '(1),5-7777', '(1),5-7777', '(1),5-7777', '1', '1']

既然我们已经将每一列单独放在一个列表中，那么构建字典就不那么困难了。

for i in range(noOfLine): # For each row
    result = {} # This is a new entry in the final JSON of ports.
    for j in range(colSize): # For each column
        if(columns[j] == 'VLAN'): # For VLAN, we need to perform a split to get list output
            result[columns[j]] = finalList[j][i].split(',') # If the VLAN is a single entry, we would just get that entry in a list.
        else:
            result[columns[j]] = finalList[j][i].strip() # Removes any excessive spaces.
    name = finalList[0][i] # Get the port name like Gi1/0/1 or Gi1/0/2

    output[name.split('/')[-1]] = result # Compute the actual port number. Split at '/' gives ['Gi1','0','1'] from which we take a last entry as the port number.

完整代码放在一起：

import pprint

with open('Data_to_Csv.txt', 'r') as file:
    contents = file.read()

lines = contents.split('\n')
noOfLine = len(lines)
columns = ['Interfaces', 'Description', 'Duplex', 'Speed', 'Neg', 'LinkState', 'Flow Control', 'M', 'VLAN']
colSize = len(columns)
finalList = []
output = {}

for i in range(len(columns)):
    resultList = []
    newLineList = []
    maxSize = 0
    for line in lines:
        line = line.rstrip()
        curString = line.split(' ')[0]
        maxSize = max(maxSize, len(curString))
        resultList.append(curString)
    finalList.append(resultList)
    minLength = 10000
    if i != colSize-1:
        for line in lines:
            minLength = min(minLength, len(line[maxSize:])-len(line[maxSize:].lstrip()))
        for line in lines:
            newLineList.append(line[maxSize+minLength:])
        lines = []
        for line in newLineList:
            lines.append(line)

for x in finalList:
    print(x)

for i in range(noOfLine):
    result = {}
    for j in range(colSize):
        # print("J = {}".format(j))
        if(columns[j] == 'VLAN'):
            result[columns[j]] = finalList[j][i].split(',')
        else:
            result[columns[j]] = finalList[j][i].strip()
        # print("Columns = {}".format(columns[j]))
        # print("result = {}".format(result[columns[j]]))
    name = finalList[0][i]

    output[name.split('/')[-1]] = result
pprint.pprint(output)

以上代码的输出：

{'1': {'Description': 'COMPUTER1',
       'Duplex': 'Full',
       'Flow Control': 'off',
       'Interfaces': 'Gi1/0/1',
       'LinkState': 'Down',
       'M': 'A',
       'Neg': 'Auto',
       'Speed': '1000',
       'VLAN': ['(1)', '5-7777']},
 '2': {'Description': 'COMPUTER2',
       'Duplex': 'Full',
       'Flow Control': 'On',
       'Interfaces': 'Gi1/0/2',
       'LinkState': 'Down',
       'M': 'T',
       'Neg': 'Auto',
       'Speed': '1000',
       'VLAN': ['(1)', '5-7777']},
 '3': {'Description': 'COMPUTER3',
       'Duplex': 'Full',
       'Flow Control': 'Off',
       'Interfaces': 'Gi1/0/3',
       'LinkState': 'Up',
       'M': 'A',
       'Neg': 'Auto',
       'Speed': '1000',
       'VLAN': ['(1)', '5-7777']},
 '4': {'Description': 'COMPUTER4',
       'Duplex': 'Full',
       'Flow Control': 'Off',
       'Interfaces': 'Gi1/0/4',
       'LinkState': 'Down',
       'M': 'A',
       'Neg': 'Auto',
       'Speed': '1000',
       'VLAN': ['(1)', '5-7777']},
 '47': {'Description': '',
        'Duplex': 'N/A',
        'Flow Control': 'Off',
        'Interfaces': 'Gi1/0/47',
        'LinkState': 'D-Down',
        'M': 'A',
        'Neg': 'Auto',
        'Speed': 'Unknown',
        'VLAN': ['1']},
 '48': {'Description': '',
        'Duplex': 'N/A',
        'Flow Control': 'Off',
        'Interfaces': 'Gi1/0/48',
        'LinkState': 'Down',
        'M': 'A',
        'Neg': 'Auto',
        'Speed': 'Unknown',
        'VLAN': ['1']}}

使用嵌套列表和自定义键值对将字符串转换为嵌套字典

Convert string to nested dictionary with nested list and custom key-value pairs

python

string

parsing

dictionary

type-conversion