从 pandas df 的行内的嵌套字典中获取项目值并摆脱其余部分

Get an item value from a nested dictionary inside the rows of a pandas df and get rid off the rest

我实现了 allennlp's OIE,它提取嵌套字符串中嵌入的主语、谓语、宾语信息(以 ARG0、V、ARG1 等形式)。但是,我需要确保每个输出都链接到原始句子的给定 ID

我生成了以下 pandas 数据框,其中 OIE output 包含 allennlp 算法的原始输出。


sentence ID OIE output
'The girl went to the cinema' 'abcd' {'verbs':[{'verb': 'went', 'description':'[ARG0: The girl] [V: went] [ARG1:to the cinema]'}]}
'He is right and he is an engineer' 'efgh' {'verbs':[{'verb': 'is', 'description':'[ARG0: He] [V: is] [ARG1:right]'}, {'verb': 'is', 'description':'[ARG0: He] [V: is] [ARG1:an engineer]'}]}


oie_l = []

for sent in sentences:
  oie_pred = predictor_oie.predict(sentence=sent) #allennlp oie predictor
  for d in oie_pred['verbs']: #get to the nested info
    d.pop('tags') #remove unnecessary info

df['OIE out'] = oie_l #add new column to df


sentence ID OIE Triples
'The girl went to the cinema' 'abcd' '[ARG0: The girl] [V: went] [ARG1:to the cinema]'
'He is right and he is an engineer' 'efgh' '[ARG0: He] [V: is] [ARG1:right]'
'He is right and he is an engineer' 'efgh' '[ARG0: He] [V: is] [ARG1:an engineer]'


为了获得所需的 'OIE Triples' 输出,我正在考虑将初始 'OIE output' 转换为字符串,然后使用正则表达式提取 ARG。但是,我不确定这是否是最佳解决方案,因为 'ARGs' 可能会有所不同。另一种方法,将迭代到 description: 的嵌套值,以列表的形式替换当前在 OIE 输出中的内容,然后实现 df.explode() 方法来扩展它,以便正确的句子和 id 列链接到 'exploding'.




import ast
df["OIE Triples"] = df["OIE output"].apply(ast.literal_eval)

df["OIE Triples"] = df["OIE Triples"].apply(lambda val: [a_dict["description"]
                                                         for a_dict in val["verbs"]])
df = df.explode("OIE Triples").drop(columns="OIE output")

如果 "OIE output" 值不是真正的 dict 而是 strings,我们通过 ast.literal_eval 将它们转换为 dicts。 (因此,如果它们是 dict,您可以跳过前两行)。

然后我们得到一个列表,该序列的每个 value 由 "verbs" 键入的最外层字典的 "description" 组成。

最后 explode 这个 description 列出了 drop 不再需要的 "OIE output" 列。


                              sentence      ID                                      OIE Triples
0        'The girl went to the cinema'  'abcd'  [ARG0: The girl] [V: went] [ARG1:to the cinema]
1  'He is right and he is an engineer'  'efgh'                  [ARG0: He] [V: is] [ARG1:right]
1  'He is right and he is an engineer'  'efgh'            [ARG0: He] [V: is] [ARG1:an engineer]