使用 zip 函数处理字典，在 Python 中保持相同的结构

Question

我有一个字典，其中每个键都映射到一个数组列表，除了键“reference”，它只是一个整数数组。

cls["input_ids"], cls["attention_masks"], cls["标签"], cls["参考"]

一个键的每一行都链接到其他键的行（这是 Bert 分词器的修改输出）

我想通过参考值过滤掉一些行并在输出中保持相同的字典结构，现在我设法做到这一点的唯一方法是这样的：我放了一些随机数据来给出一个想法


cls= {"input_ids":[[22,22,22],[33,33,33]], "attention_masks":[[22,22,22],[33,33,33]], "reference":[1,0], "labels":[[[22,22,22],[33,33,33]]]}
mcp = {"input_ids":[], "attention_masks":[], "reference":[], "labels":[]}

        for el in zip(cls["input_ids"], cls["attention_masks"], cls["reference"], cls["labels"]):
            if el[2] == 1:
                mcp["input_ids"].append(el[0])
                mcp["attention_masks"].append(el[1])
                mcp["reference"].append(el[2])
                mcp["labels"].append(el[3])

但我真的不喜欢这段代码，我想知道是否有更漂亮的方法。

Answer 1

假设你的值都具有相同的长度，我建议使用 pandas 来进行这种修改。

import pandas as pd

cls= {"input_ids":[[22,22,22],[33,33,33]], 
      "attention_masks":[[22,22,22],[33,33,33]], 
      "reference":[1,0], 
      "labels":[[22,22,22],[33,33,33]] # I assume this is what you meant
      }

# Turn the data into a dataframe which is sth like a table
df = pd.DataFrame(cls)

这就是 df 的样子：

>>> df
      input_ids attention_masks  reference        labels
0  [22, 22, 22]    [22, 22, 22]          1  [22, 22, 22]
1  [33, 33, 33]    [33, 33, 33]          0  [33, 33, 33]

您可以访问 df['reference'] 等值并过滤数据，例如：

>>> df[df['reference']==1]
      input_ids attention_masks  reference        labels
0  [22, 22, 22]    [22, 22, 22]          1  [22, 22, 22]

如果你需要它作为字典：

>>> df[df['reference']==1].to_dict()
{'input_ids': {0: [22, 22, 22]},
 'attention_masks': {0: [22, 22, 22]},
 'reference': {0: 1},
 'labels': {0: [22, 22, 22]}}

使用 zip 函数处理字典，在 Python 中保持相同的结构

Processing dictionary with zip function keeping the same structure in Python

python

dictionary

bert-language-model