从具有嵌套列表列表的现有列创建动态列时出错
Error in creating dynamic columns from existing column having nested list of lists
我想从包含嵌套列表列表作为值的现有列创建两列。
包含 3 个公司参与者及其角色的记录行:
**row 1** [{'roles': [{'type': 'director'}, {'type': 'founder'}, {'type': 'owner'}, {'type': 'real_owner'}], 'life': {'name': 'Lichun Du'}}]
**row 2** [{'roles': [{'type': 'board'}], 'life': {'name': 'Erik Mølgaard'}}, {'roles': [{'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Mikael Bodholdt Linde'}}, {'roles': [{'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Dorte Bøcker Linde'}}]
**row 3** [{'roles': [{'type': 'director'}, {'type': 'real_owner'}], 'life': {'name': 'Kristian Løth Hougaard'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'WORLD JET HOLDING ApS'}}]
到目前为止我已经尝试过:
responses['Role of Participant(s)'] = [element[0]['roles'] for element in responses['participants']]
responses['Role of Participant(s)'] = responses['Role of Participant(s)'].apply(lambda x: ', '.join(t['type'] for t in x))
responses['Name of Participant(s)'] = [element[0]['life']['name'] for element in responses['participants']]
这给了我以下输出:
只是returns我只有第一个参与者的角色和名字。
但是,我需要所有参与者及其各自的角色 row/records,如下所示:
那么如何使用“***”作为每行值的分隔符来实现这一点,就像上面的屏幕截图一样?
更新
这是数据框的 csv 版本:
participants
"[{'roles': [{'type': 'founder'}], 'life': {'name': 'Poul Erik Andersen'}}, {'roles': [{'type': 'director'}, {'type': 'board'}], 'life': {'name': 'Martin Ravn-Nielsen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Søren Haugaard'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Mads Dehlsen Winther'}}, {'roles': [{'type': 'founder'}], 'life': {'name': 'M+ Ejendomme A/S'}}, {'roles': [{'type': 'founder'}], 'life': {'name': 'MILTON HOLDING HORSENS A/S'}}, {'roles': [{'type': 'accountant'}], 'life': {'name': 'EY Godkendt Revisionspartnerselskab'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'HUSCOMPAGNIET HOLDING A/S'}}]"
"[{'roles': [{'type': 'founder'}, {'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Rasmus Gert Hansen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'John Nyrup Larsen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Ole Nidolf Larsen'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'RASMUS HANSEN HOLDING ApS'}}, {'roles': [{'type': 'accountant'}], 'life': {'name': 'DANSK REVISION SLAGELSE GODKENDT REVISIONSAKTIESELSKAB'}}]"
"[{'roles': [{'type': 'board'}], 'life': {'name': 'Berit Pedersen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Sanne Kristine Späth'}}, {'roles': [{'type': 'real_owner'}], 'life': {'name': 'Kjeld Kirk Kristiansen'}}, {'roles': [{'type': 'director'}], 'life': {'name': 'Jesper Andersen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Poul Hartvig Nielsen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Nanna Birgitta Gudum'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Henrik Baagøe Fredeløkke'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Carsten Rasmussen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Jesper Laursen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'John Hansen'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'LEGO A/S'}}, {'roles': [{'type': 'accountant'}], 'life': {'name': 'PRICEWATERHOUSECOOPERS STATSAUTORISERET REVISIONSPARTNERSELSKAB'}}]"
您需要第二个 for
循环而不是 [0]
我使用普通函数而不是 lambda
以使其更具可读性。
第一个角色:
import pandas as pd
data = {'participants':
[
[{'roles': [{'type': 'director'}, {'type': 'founder'}, {'type': 'owner'}, {'type': 'real_owner'}], 'life': {'name': 'Lichun Du'}}],
[{'roles': [{'type': 'board'}], 'life': {'name': 'Erik Mølgaard'}}, {'roles': [{'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Mikael Bodholdt Linde'}}, {'roles': [{'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Dorte Bøcker Linde'}}],
[{'roles': [{'type': 'director'}, {'type': 'real_owner'}], 'life': {'name': 'Kristian Løth Hougaard'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'WORLD JET HOLDING ApS'}}],
]
}
df = pd.DataFrame(data)
def get_roles(cell):
results = []
for item in cell:
roles = []
for role in item['roles']:
roles.append(role['type'])
results.append(",".join(roles))
results = "***".join(results)
return results
df['Role of Participant(s)'] = df['participants'].apply(get_roles)
print(df[['Role of Participant(s)']].to_string())
结果:
Role of Participant(s)
0 director,founder,owner,real_owner
1 board***director,board,real_owner***board,real_owner
2 director,real_owner***owner
现在你可以尝试写成lambda
df['Role of Participant(s)'] = df['participants'].apply(lambda cell:"***".join([",".join(role['type'] for role in item['roles']) for item in cell]))
类似于 name:
def get_names(cell):
results = []
for item in cell:
results.append(item['life']['name'])
results = "***".join(results)
return results
df['Name of Participant(s)'] = df['participants'].apply(get_names)
现在 lambda
df['Name of Participant(s)'] = df['participants'].apply(lambda cell:"***".join(item['life']['name'] for item in cell))
编辑:
版本在一个 apply
中创建两个列并跳过具有角色 director
的参与者
import pandas as pd
data = {'participants':
[
[{'roles': [{'type': 'director'}, {'type': 'founder'}, {'type': 'owner'}, {'type': 'real_owner'}], 'life': {'name': 'Lichun Du'}}],
[{'roles': [{'type': 'board'}], 'life': {'name': 'Erik Mølgaard'}}, {'roles': [{'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Mikael Bodholdt Linde'}}, {'roles': [{'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Dorte Bøcker Linde'}}],
[{'roles': [{'type': 'director'}, {'type': 'real_owner'}], 'life': {'name': 'Kristian Løth Hougaard'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'WORLD JET HOLDING ApS'}}],
]
}
df = pd.DataFrame(data)
def get_names_and_roles(cell):
all_names = []
all_roles = []
for item in cell:
name = item['life']['name']
roles = [role['type'] for role in item['roles']]
if 'director' not in roles:
all_names.append(name)
all_roles.append(",".join(roles))
all_names = "***".join(all_names)
all_roles = "***".join(all_roles)
return pd.Series([all_names, all_roles])
df[ ['Name of Participant(s)', 'Role of Participant(s)'] ] = df['participants'].apply(get_names_and_roles)
print(df[ ['Name of Participant(s)', 'Role of Participant(s)'] ].to_string())
结果:
Name of Participant(s) Role of Participant(s)
0
1 Erik Mølgaard***Dorte Bøcker Linde board***board,real_owner
2 WORLD JET HOLDING ApS owner
我想从包含嵌套列表列表作为值的现有列创建两列。
包含 3 个公司参与者及其角色的记录行:
**row 1** [{'roles': [{'type': 'director'}, {'type': 'founder'}, {'type': 'owner'}, {'type': 'real_owner'}], 'life': {'name': 'Lichun Du'}}]
**row 2** [{'roles': [{'type': 'board'}], 'life': {'name': 'Erik Mølgaard'}}, {'roles': [{'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Mikael Bodholdt Linde'}}, {'roles': [{'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Dorte Bøcker Linde'}}]
**row 3** [{'roles': [{'type': 'director'}, {'type': 'real_owner'}], 'life': {'name': 'Kristian Løth Hougaard'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'WORLD JET HOLDING ApS'}}]
到目前为止我已经尝试过:
responses['Role of Participant(s)'] = [element[0]['roles'] for element in responses['participants']]
responses['Role of Participant(s)'] = responses['Role of Participant(s)'].apply(lambda x: ', '.join(t['type'] for t in x))
responses['Name of Participant(s)'] = [element[0]['life']['name'] for element in responses['participants']]
这给了我以下输出:
只是returns我只有第一个参与者的角色和名字。
但是,我需要所有参与者及其各自的角色 row/records,如下所示:
那么如何使用“***”作为每行值的分隔符来实现这一点,就像上面的屏幕截图一样?
更新
这是数据框的 csv 版本:
participants
"[{'roles': [{'type': 'founder'}], 'life': {'name': 'Poul Erik Andersen'}}, {'roles': [{'type': 'director'}, {'type': 'board'}], 'life': {'name': 'Martin Ravn-Nielsen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Søren Haugaard'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Mads Dehlsen Winther'}}, {'roles': [{'type': 'founder'}], 'life': {'name': 'M+ Ejendomme A/S'}}, {'roles': [{'type': 'founder'}], 'life': {'name': 'MILTON HOLDING HORSENS A/S'}}, {'roles': [{'type': 'accountant'}], 'life': {'name': 'EY Godkendt Revisionspartnerselskab'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'HUSCOMPAGNIET HOLDING A/S'}}]"
"[{'roles': [{'type': 'founder'}, {'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Rasmus Gert Hansen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'John Nyrup Larsen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Ole Nidolf Larsen'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'RASMUS HANSEN HOLDING ApS'}}, {'roles': [{'type': 'accountant'}], 'life': {'name': 'DANSK REVISION SLAGELSE GODKENDT REVISIONSAKTIESELSKAB'}}]"
"[{'roles': [{'type': 'board'}], 'life': {'name': 'Berit Pedersen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Sanne Kristine Späth'}}, {'roles': [{'type': 'real_owner'}], 'life': {'name': 'Kjeld Kirk Kristiansen'}}, {'roles': [{'type': 'director'}], 'life': {'name': 'Jesper Andersen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Poul Hartvig Nielsen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Nanna Birgitta Gudum'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Henrik Baagøe Fredeløkke'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Carsten Rasmussen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Jesper Laursen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'John Hansen'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'LEGO A/S'}}, {'roles': [{'type': 'accountant'}], 'life': {'name': 'PRICEWATERHOUSECOOPERS STATSAUTORISERET REVISIONSPARTNERSELSKAB'}}]"
您需要第二个 for
循环而不是 [0]
我使用普通函数而不是 lambda
以使其更具可读性。
第一个角色:
import pandas as pd
data = {'participants':
[
[{'roles': [{'type': 'director'}, {'type': 'founder'}, {'type': 'owner'}, {'type': 'real_owner'}], 'life': {'name': 'Lichun Du'}}],
[{'roles': [{'type': 'board'}], 'life': {'name': 'Erik Mølgaard'}}, {'roles': [{'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Mikael Bodholdt Linde'}}, {'roles': [{'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Dorte Bøcker Linde'}}],
[{'roles': [{'type': 'director'}, {'type': 'real_owner'}], 'life': {'name': 'Kristian Løth Hougaard'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'WORLD JET HOLDING ApS'}}],
]
}
df = pd.DataFrame(data)
def get_roles(cell):
results = []
for item in cell:
roles = []
for role in item['roles']:
roles.append(role['type'])
results.append(",".join(roles))
results = "***".join(results)
return results
df['Role of Participant(s)'] = df['participants'].apply(get_roles)
print(df[['Role of Participant(s)']].to_string())
结果:
Role of Participant(s)
0 director,founder,owner,real_owner
1 board***director,board,real_owner***board,real_owner
2 director,real_owner***owner
现在你可以尝试写成lambda
df['Role of Participant(s)'] = df['participants'].apply(lambda cell:"***".join([",".join(role['type'] for role in item['roles']) for item in cell]))
类似于 name:
def get_names(cell):
results = []
for item in cell:
results.append(item['life']['name'])
results = "***".join(results)
return results
df['Name of Participant(s)'] = df['participants'].apply(get_names)
现在 lambda
df['Name of Participant(s)'] = df['participants'].apply(lambda cell:"***".join(item['life']['name'] for item in cell))
编辑:
版本在一个 apply
中创建两个列并跳过具有角色 director
import pandas as pd
data = {'participants':
[
[{'roles': [{'type': 'director'}, {'type': 'founder'}, {'type': 'owner'}, {'type': 'real_owner'}], 'life': {'name': 'Lichun Du'}}],
[{'roles': [{'type': 'board'}], 'life': {'name': 'Erik Mølgaard'}}, {'roles': [{'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Mikael Bodholdt Linde'}}, {'roles': [{'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Dorte Bøcker Linde'}}],
[{'roles': [{'type': 'director'}, {'type': 'real_owner'}], 'life': {'name': 'Kristian Løth Hougaard'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'WORLD JET HOLDING ApS'}}],
]
}
df = pd.DataFrame(data)
def get_names_and_roles(cell):
all_names = []
all_roles = []
for item in cell:
name = item['life']['name']
roles = [role['type'] for role in item['roles']]
if 'director' not in roles:
all_names.append(name)
all_roles.append(",".join(roles))
all_names = "***".join(all_names)
all_roles = "***".join(all_roles)
return pd.Series([all_names, all_roles])
df[ ['Name of Participant(s)', 'Role of Participant(s)'] ] = df['participants'].apply(get_names_and_roles)
print(df[ ['Name of Participant(s)', 'Role of Participant(s)'] ].to_string())
结果:
Name of Participant(s) Role of Participant(s)
0
1 Erik Mølgaard***Dorte Bøcker Linde board***board,real_owner
2 WORLD JET HOLDING ApS owner