将值合并到有序字典中的一个键

Merging values together to one key in an Ordered Dict

所以我想知道是否有比我现在实施的合并有序字典值的解决方案更优雅的解决方案。

我有一个有序的字典,看起来像这样

'fields': OrderedDict([
    ("Sample Code", "Vendor Sample ID"),
    ("Donor ID", "Vendor Subject ID"),
    ("Format", "Material Format"),
    ("Sample Type", "Sample Type"),
    ("Age", "Age"),
    ("Gender", "Gender"),
    ("Ethnicity/ Race", "Race"),
]),

如果我像这样传入一个参数作为列表

[2,3] or [2,4,5]

有没有一种优雅的方法可以将这些值合并到一个新键下

[2,3], "Random_Key"

会return

'fields': OrderedDict([
        ("Sample Code", "Vendor Sample ID"),
        ("Donor ID", "Vendor Subject ID"),
        **("Random Key", "Material Format Sample Type"),**
        ("Age", "Age"),
        ("Gender", "Gender"),
        ("Ethnicity/ Race", "Race"),
    ]),

同时删除字典中的键?

您可以通过对索引进行降序排序来优化它,然后您可以使用 dict.pop(key,None) 立即检索并删除 key/value,但我决定反对,按顺序附加值发生在 indices.

from collections import OrderedDict
from pprint import pprint

def mergeEm(d,indices,key):
    """Merges the values at index given by 'indices' on OrderedDict d into a list.        
    Appends this list with key as key to the dict. Deletes keys used to build list."""

    if not all(x < len(d) for x in indices):
        raise IndexError ("Index out of bounds")

    vals = []                      # stores the values to be removed in order
    allkeys = list(d.keys())
    for i in indices:
        vals.append(d[allkeys[i]])   # append to temporary list
    d[key] = vals                  # add to dict, use ''.join(vals) to combine str
    for i in indices:              # remove all indices keys
        d.pop(allkeys[i],None)
    pprint(d)


fields= OrderedDict([
    ("Sample Code", "Vendor Sample ID"),
    ("Donor ID", "Vendor Subject ID"),
    ("Format", "Material Format"),
    ("Sample Type", "Sample Type"),
    ("Age", "Age"),
    ("Gender", "Gender"),
    ("Ethnicity/ Race", "Race"),
    ("Sample Type", "Sample Type"),
    ("Organ", "Organ"),
    ("Pathological Diagnosis", "Diagnosis"),
    ("Detailed Pathological Diagnosis", "Detailed Diagnosis"),
    ("Clinical Diagnosis/Cause of Death", "Detailed Diagnosis option 2"),
    ("Dissection", "Dissection"),
    ("Quantity (g, ml, or ug)", "Quantity"),
    ("HIV", "HIV"),
    ("HEP B", "HEP B")
])
pprint(fields)
mergeEm(fields, [5,4,2], "tata")

输出:

OrderedDict([('Sample Code', 'Vendor Sample ID'),
             ('Donor ID', 'Vendor Subject ID'),
             ('Format', 'Material Format'),
             ('Sample Type', 'Sample Type'),
             ('Age', 'Age'),
             ('Gender', 'Gender'),
             ('Ethnicity/ Race', 'Race'),
             ('Organ', 'Organ'),
             ('Pathological Diagnosis', 'Diagnosis'),
             ('Detailed Pathological Diagnosis', 'Detailed Diagnosis'),
             ('Clinical Diagnosis/Cause of Death',
              'Detailed Diagnosis option 2'),
             ('Dissection', 'Dissection'),
             ('Quantity (g, ml, or ug)', 'Quantity'),
             ('HIV', 'HIV'),
             ('HEP B', 'HEP B')])


OrderedDict([('Sample Code', 'Vendor Sample ID'),
             ('Donor ID', 'Vendor Subject ID'),
             ('Sample Type', 'Sample Type'),
             ('Ethnicity/ Race', 'Race'),
             ('Organ', 'Organ'),
             ('Pathological Diagnosis', 'Diagnosis'),
             ('Detailed Pathological Diagnosis', 'Detailed Diagnosis'),
             ('Clinical Diagnosis/Cause of Death',
              'Detailed Diagnosis option 2'),
             ('Dissection', 'Dissection'),
             ('Quantity (g, ml, or ug)', 'Quantity'),
             ('HIV', 'HIV'),
             ('HEP B', 'HEP B'),
             ('tata', ['Gender', 'Age', 'Material Format'])])

不确定是否有优雅的方式。 OrderedDict 有一个 move_to_end 方法可以在开始或结束处移动键,但不是在随机位置。

我会尽量提高效率,并尽量减少循环

  • 获取密钥列表
  • 找到要与以下键合并的键的索引
  • 删除字典的下一个关键字
  • 创建包含 d 项的列表
  • 用存储索引处的新值更改此列表
  • 从中重建 OrderedDict

像这样(我删除了一些键,因为它缩短了示例):

from collections import OrderedDict

d = OrderedDict([
    ("Sample Code", "Vendor Sample ID"),
    ("Donor ID", "Vendor Subject ID"),
    ("Format", "Material Format"),
    ("Sample Type", "Sample Type"),
    ("Age", "Age"),
    ("Gender", "Gender"),
])

lk = list(d.keys())
index = lk.index("Sample Type")
v = d.pop(lk[index+1])

t = list(d.items())
t[index] = ("new key",t[index][1]+" "+v)

d = OrderedDict(t)

print(d)

结果:

OrderedDict([('Sample Code', 'Vendor Sample ID'), ('Donor ID', 'Vendor Subject ID'), ('Format', 'Material Format'), ('new key', 'Sample Type Age'), ('Gender', 'Gender')])

这也可以用发电机很好地完成。

如果不必压缩,此生成器会生成密钥项对,如果已压缩,则会将项保存到最后一个条目,然后生成它,并使用新密钥和已保存的项连接.

使用生成器可以构造一个新的 OrderedDict。

from collections import OrderedDict    

def sqaushDict(d, ind, new_key):
    """ Takes an OrderedDictionary d, and yields its key item pairs, 
    except the ones at an index in indices (ind), these items are merged 
    and yielded at the last position of indices (ind) with a new key (new_key)
    """
    if not all(x < len(d) for x in ind):
        raise IndexError ("Index out of bounds")
    vals = []
    for n, (k, i), in enumerate(d.items()):
        if n in ind:
            vals += [i]
            if n == ind[-1]:
                yield (new_key, " ".join(vals))
        else:
            yield (i, k)

d = OrderedDict([
    ("Sample Code", "Vendor Sample ID"),
    ("Donor ID", "Vendor Subject ID"),
    ("Format", "Material Format"),
    ("Sample Type", "Sample Type"),
    ("Age", "Age"),
    ("Gender", "Gender"),
])

t = OrderedDict(squashDict(d, [2, 3], "Random"))
print(t)