如何附加所需的不同长度的子列表元素？

Question

我有一个如下所示的文本，我想将其存储在字典中，将相同参数（键）的值存储在子列表中。

file = """./path/to/Inventory2020_1.txt
fileType                           = Inventory
StoreCode
    number:1145C
numId                              = 905895
ValuesOfProducts
    prodsTypeA:150
    prodsTypeB:189
    UpdateTime:2020-03-05 14:45:38
InventoryTime                         = 2020-03-05 14:45:29
userName
    number:123

./path/to/Inventory2020_2.txt   
fileType                           = Inventory
StoreCode
    number:7201B
numId                              = 54272
ValuesOfProducts
    prodsTypeA:75
    prodsTypeB:231
    UpdateTime:2020-03-06 09:12:22
InventoryTime                         = 2020-03-06 09:11:47
userName
    number:3901 
"""

我当前的代码使用这一行成功地将文本存储在嵌套列表中：

import re

a = [ re.sub(r' += +', ':', line).replace(":", "=", 1).strip().split("=") for line in file.splitlines() ]

现在，为了使用参数 keys 存储在字典中，我使用了一些条件，如下所示：

d = dict()

for lst in a:
    if len(lst) > 1:
        d.setdefault(lst[0], []).append(lst[1])
    else:
        if "path" in lst[0]:
            d.setdefault("File", []).append(re.sub(r'.+/', '', lst[0]))

>>> d
{
'File': ['Inventory2020_1.txt', 'Inventory2020_2.txt'], 
'fileType': ['Inventory', 'Inventory'], 
'number': ['1145C', '123', '7201B', '3901'], 
'numId': ['905895', '54272'], 
'prodsTypeA': ['150', '75'], 
'prodsTypeB': ['189', '231'], 
'UpdateTime': ['2020-03-05 14:45:38 -05:00', '2020-03-06 09:12:22'], 
'InventoryTime': ['2020-03-05 14:45:29', '2020-03-06 09:11:47']
}
>>>

如您所见，对于某些参数，相关值在同一行中用 = 符号分隔，我可以将 key, value 对存储在同一子列表中直接使用 split("=")。但是我感兴趣的一些关键值在不同的行中，例如：

StoreCode
    number:1145C

在这种情况下，我感兴趣的 key,pair 值是 key=StoreCode 和 value=1145C

对于这个：

ValuesOfProducts
    prodsTypeA:75
    prodsTypeB:231
    UpdateTime:2020-03-06 09:12:22

我感兴趣的 key,value 对是：

key=prodsTypeA 和 value=75
key=prodsTypeB 和 value=231
key=UpdateTime 和 value=2020-03-06 09:12:22

因此，最终的字典将具有以下结构：

{
'File': ['Inventory2020_1.txt', 'Inventory2020_2.txt'], 
'fileType': ['Inventory', 'Inventory'], 
'StoreCode': ['1145C', '7201B'], 
'numId': ['905895', '54272'], 
'prodsTypeA': ['150', '75'], 
'prodsTypeB': ['189', '231'], 
'UpdateTime': ['2020-03-05 14:45:38', '2020-03-06 09:12:22'], 
'InventoryTime': ['2020-03-05 14:45:29', '2020-03-06 09:11:47']
'userName': ['123', '3901']
}

主要问题是，在我当前的输出中，参数 StoreCode 和 userName 具有我感兴趣的与词 number 相关的值。然后，正在附加这些混合值，实际上一些与 number 相关的值属于键 StoreCode 和其他与 number 相关的值属于键 userName.

请有人帮助我获得预期的输出。提前致谢。

Answer 1

这与您指定的方式不完全相同，但假设结构在所有方面都保持不变，则避免使用正则表达式的以下内容（或类似内容）可能对您有用：

subfiles = file.split('./path/to/')
locs = [0,2,3,5,6,7,8,10]
vals = []
for s in subfiles[1:]:    
    target = s.strip().splitlines()[1:]
    row = [s.split('fileType')[0].strip()]
    for loc in locs:        
        if "=" in target[loc]:
            entry = target[loc].split('=', 1)[1].strip()     
        else:
            if ":" in target[loc]:
                entry = target[loc].split(':',1)[1].strip()
        row.append(entry)
    vals.append(row)

key_names =['File','fileType', 'StoreCodenumber','numId','ValueOfProdsTypeA','ValueOfProdsTypeB','ProdsUpdateTime','InventoryTime','userName']
d = {}
for k, v1, v2 in zip(key_names,vals[0],vals[1]):
    d[k] = [v1,v2]
d

输出：

{'File': ['Inventory2020_1.txt', 'Inventory2020_2.txt'],
 'fileType': ['Inventory', 'Inventory'],
 'StoreCodenumber': ['1145C', '7201B'],
 'numId': ['905895', '54272'],
 'ValueOfProdsTypeA': ['150', '75'],
 'ValueOfProdsTypeB': ['189', '231'],
 'ProdsUpdateTime': ['2020-03-05 14:45:38 -05:00', '2020-03-06 09:12:22'],
 'InventoryTime': ['2020-03-05 14:45:29', '2020-03-06 09:11:47'],
 'userName': ['123', '3901']}

显然，您可以根据自己的实际需要对其进行修改。

如何附加所需的不同长度的子列表元素？

How to append desired different length sublist elements?

parsing

text-files

python-3.x