将原始数据框数据转换为带有列表的字典
turning a raw dataframe data into a dictionary with a list
我正在尝试转换此数据(在 Dataframe 中):
0 1
0 HT01 CC363292
29 RL01 CC363292
50 TN01 CC363292
4 BN02 CC363293
7 MR20 CC363294
9 TN01 CC363295
10 RL01 CC363296
13 HT01 CC363297
17 HT01 CC363298
21 SU01 CC363299
22 BN02 CC363300
25 MR20 CC363301
27 MR20 CC363302
54 BN02 CC363313
57 BN02 CC363314
60 BN02 CC363315
52 SU01 EA363303
32 RL01 EA363303
35 MR20 EA363304
37 HU01 EA363305
38 HU01 EA363306
39 BN02 EA363307
63 RL01 EA363311
66 MR20 EA363312
42 HT01 SC363308
46 RL01 SC363309
51 SP01 SC363309
53 FU01 SC363309
49 SP01 SC363310
进入使用 col.0
作为键和匹配 col.1
信息列表的字典(见下文)
即
temp_dict = {
CC363292 : [HT01, RL01, TN01]
CC363293 : [BN02]
}
我曾尝试使用 for 循环将列表附加到密钥,但没有成功。
有人可以帮忙吗?
我对 Pandas 有点熟悉,这是我想出的解决方案:
# Create DataFrame from initial data
data = [
('HT01', 'CC363292'),
('RL01', 'CC363292'),
('TN01', 'CC363292'),
('BN02', 'CC363293'),
...
]
df = pandas.DataFrame(data=data, columns=['col1', 'col2'])
# This will create a such dataframe:
# col1 | col2 |
# HT01 | CC363292 |
# RL01 | CC363292 |
# TN01 | CC363292 |
# BN02 | CC363293 |
# .... | ........ |
# The next step is to convert 'col2' to categorical
_df = pandas.get_dummies(data=df, columns=['col2'], prefix='', prefix_sep='')
# This will give us such result:
# col1 | CC363292 | CC363293 | ...
# HT01 | 1 | 0 | ...
# RL01 | 1 | 0 | ...
# TN01 | 1 | 0 | ...
# BN02 | 0 | 1 | ...
# .... | ........ | ........ | ...
# Then we'll create the simple lambda function to initialize our lists:
f = lambda col: [_df.col1[i] for i, val in enumerate(_df[col]) if val]
# And obtain the requested result using dict-comprehensions:
my_dict = {col: f(col) for col in _df.columns[1:]}
# Important: using _df.columns[1:] is not very universal, but
# will be ok for the problem you described
题目要求按列对行进行分组,并根据分组列出另一列。一个快速的解决方案可以是:
import pandas as pd
data = [
[0, "HT01", "CC363292"],
[29, "RL01", "CC363292"],
[50, "TN01", "CC363292"],
[4, "BN02", "CC363293"],
[7, "MR20", "CC363294"],
[9, "TN01", "CC363295"],
[10, "RL01", "CC363296"],
[13, "HT01", "CC363297"],
[17, "HT01", "CC363298"],
[21, "SU01", "CC363299"],
[22, "BN02", "CC363300"],
[25, "MR20", "CC363301"],
[27, "MR20", "CC363302"],
[54, "BN02", "CC363313"],
[57, "BN02", "CC363314"],
[60, "BN02", "CC363315"],
[52, "SU01", "EA363303"],
[32, "RL01", "EA363303"],
[35, "MR20", "EA363304"],
[37, "HU01", "EA363305"],
[38, "HU01", "EA363306"],
[39, "BN02", "EA363307"],
[63, "RL01", "EA363311"],
[66, "MR20", "EA363312"],
[42, "HT01", "SC363308"],
[46, "RL01", "SC363309"],
[51, "SP01", "SC363309"],
[53, "FU01", "SC363309"],
[49, "SP01", "SC363310"],
]
df = pd.DataFrame(data)
# Group by the third column.
# List the second column.
groups = df.groupby(df.columns[2])[df.columns[1]].apply(list)
print(groups)
输出应类似于:
CC363292 [HT01, RL01, TN01]
CC363293 [BN02]
CC363294 [MR20]
CC363295 [TN01]
CC363296 [RL01]
CC363297 [HT01]
CC363298 [HT01]
CC363299 [SU01]
CC363300 [BN02]
CC363301 [MR20]
EA363311 [RL01]
EA363312 [MR20]
SC363308 [HT01]
SC363309 [RL01, SP01, FU01]
SC363310 [SP01]
要转换为字典,请改用 dict(groups)
。输出应该是:
{
'CC363292': ['HT01', 'RL01', 'TN01'],
'CC363293': ['BN02'],
'CC363294': ['MR20'],
'CC363295': ['TN01'],
'CC363296': ['RL01'],
'CC363297': ['HT01'],
'CC363298': ['HT01'],
'CC363299': ['SU01'],
'CC363300': ['BN02'],
'CC363301': ['MR20'],
'CC363302': ['MR20'],
'CC363313': ['BN02'],
'CC363314': ['BN02'],
'CC363315': ['BN02'],
'EA363303': ['SU01', 'RL01'],
'EA363304': ['MR20'],
'EA363305': ['HU01'],
'EA363306': ['HU01'],
'EA363307': ['BN02'],
'EA363311': ['RL01'],
'EA363312': ['MR20'],
'SC363308': ['HT01'],
'SC363309': ['RL01', 'SP01', 'FU01'],
'SC363310': ['SP01']
}
zip_data = zip(df['col1'], df['col2'])
result = {}
for i in zip_data:
result.setdefault(i[1], []).append(i[0])
这可能有效。
我正在尝试转换此数据(在 Dataframe 中):
0 1
0 HT01 CC363292
29 RL01 CC363292
50 TN01 CC363292
4 BN02 CC363293
7 MR20 CC363294
9 TN01 CC363295
10 RL01 CC363296
13 HT01 CC363297
17 HT01 CC363298
21 SU01 CC363299
22 BN02 CC363300
25 MR20 CC363301
27 MR20 CC363302
54 BN02 CC363313
57 BN02 CC363314
60 BN02 CC363315
52 SU01 EA363303
32 RL01 EA363303
35 MR20 EA363304
37 HU01 EA363305
38 HU01 EA363306
39 BN02 EA363307
63 RL01 EA363311
66 MR20 EA363312
42 HT01 SC363308
46 RL01 SC363309
51 SP01 SC363309
53 FU01 SC363309
49 SP01 SC363310
进入使用 col.0
作为键和匹配 col.1
信息列表的字典(见下文)
即
temp_dict = {
CC363292 : [HT01, RL01, TN01]
CC363293 : [BN02]
}
我曾尝试使用 for 循环将列表附加到密钥,但没有成功。
有人可以帮忙吗?
我对 Pandas 有点熟悉,这是我想出的解决方案:
# Create DataFrame from initial data
data = [
('HT01', 'CC363292'),
('RL01', 'CC363292'),
('TN01', 'CC363292'),
('BN02', 'CC363293'),
...
]
df = pandas.DataFrame(data=data, columns=['col1', 'col2'])
# This will create a such dataframe:
# col1 | col2 |
# HT01 | CC363292 |
# RL01 | CC363292 |
# TN01 | CC363292 |
# BN02 | CC363293 |
# .... | ........ |
# The next step is to convert 'col2' to categorical
_df = pandas.get_dummies(data=df, columns=['col2'], prefix='', prefix_sep='')
# This will give us such result:
# col1 | CC363292 | CC363293 | ...
# HT01 | 1 | 0 | ...
# RL01 | 1 | 0 | ...
# TN01 | 1 | 0 | ...
# BN02 | 0 | 1 | ...
# .... | ........ | ........ | ...
# Then we'll create the simple lambda function to initialize our lists:
f = lambda col: [_df.col1[i] for i, val in enumerate(_df[col]) if val]
# And obtain the requested result using dict-comprehensions:
my_dict = {col: f(col) for col in _df.columns[1:]}
# Important: using _df.columns[1:] is not very universal, but
# will be ok for the problem you described
题目要求按列对行进行分组,并根据分组列出另一列。一个快速的解决方案可以是:
import pandas as pd
data = [
[0, "HT01", "CC363292"],
[29, "RL01", "CC363292"],
[50, "TN01", "CC363292"],
[4, "BN02", "CC363293"],
[7, "MR20", "CC363294"],
[9, "TN01", "CC363295"],
[10, "RL01", "CC363296"],
[13, "HT01", "CC363297"],
[17, "HT01", "CC363298"],
[21, "SU01", "CC363299"],
[22, "BN02", "CC363300"],
[25, "MR20", "CC363301"],
[27, "MR20", "CC363302"],
[54, "BN02", "CC363313"],
[57, "BN02", "CC363314"],
[60, "BN02", "CC363315"],
[52, "SU01", "EA363303"],
[32, "RL01", "EA363303"],
[35, "MR20", "EA363304"],
[37, "HU01", "EA363305"],
[38, "HU01", "EA363306"],
[39, "BN02", "EA363307"],
[63, "RL01", "EA363311"],
[66, "MR20", "EA363312"],
[42, "HT01", "SC363308"],
[46, "RL01", "SC363309"],
[51, "SP01", "SC363309"],
[53, "FU01", "SC363309"],
[49, "SP01", "SC363310"],
]
df = pd.DataFrame(data)
# Group by the third column.
# List the second column.
groups = df.groupby(df.columns[2])[df.columns[1]].apply(list)
print(groups)
输出应类似于:
CC363292 [HT01, RL01, TN01]
CC363293 [BN02]
CC363294 [MR20]
CC363295 [TN01]
CC363296 [RL01]
CC363297 [HT01]
CC363298 [HT01]
CC363299 [SU01]
CC363300 [BN02]
CC363301 [MR20]
EA363311 [RL01]
EA363312 [MR20]
SC363308 [HT01]
SC363309 [RL01, SP01, FU01]
SC363310 [SP01]
要转换为字典,请改用 dict(groups)
。输出应该是:
{
'CC363292': ['HT01', 'RL01', 'TN01'],
'CC363293': ['BN02'],
'CC363294': ['MR20'],
'CC363295': ['TN01'],
'CC363296': ['RL01'],
'CC363297': ['HT01'],
'CC363298': ['HT01'],
'CC363299': ['SU01'],
'CC363300': ['BN02'],
'CC363301': ['MR20'],
'CC363302': ['MR20'],
'CC363313': ['BN02'],
'CC363314': ['BN02'],
'CC363315': ['BN02'],
'EA363303': ['SU01', 'RL01'],
'EA363304': ['MR20'],
'EA363305': ['HU01'],
'EA363306': ['HU01'],
'EA363307': ['BN02'],
'EA363311': ['RL01'],
'EA363312': ['MR20'],
'SC363308': ['HT01'],
'SC363309': ['RL01', 'SP01', 'FU01'],
'SC363310': ['SP01']
}
zip_data = zip(df['col1'], df['col2'])
result = {}
for i in zip_data:
result.setdefault(i[1], []).append(i[0])
这可能有效。