具有 1 到多个键值对以及 1:1 键值对的数据帧?
Pivot dataframe with 1 to many key value pairs along with 1:1 key value pairs?
这是我尝试旋转的玩具数据框示例:
import pandas as pd
df = pd.DataFrame({'id': [0, 0, 0, 0, 0, 1, 1,1], 'key':['role', 'role', 'role', 'dep', 'country', 'role', 'dep', 'country'], 'val': ['admin', 'local_usr', 'fin_dep_ds', 'fin', 'US', 'kuku', 'security', 'DE']})
df.pivot_table(index="id", columns="val", aggfunc="size", ).reset_index()
但我得到的输出是:
val id DE US admin fin fin_dep_ds kuku local_usr security
0 0 NaN 1.0 1.0 1.0 1.0 NaN 1.0 NaN
1 1 1.0 NaN NaN NaN NaN 1.0 NaN 1.0
我想把它改造成:
id admin local_usr fin_dep_ds kuku country dep
0 1 1 1 0 US fin
1 0 0 0 1 DE security
请告知如何调整我的 df 以获得此结果,在我看来我需要将 df 拆分为 2 个部分并加入 - 每个键具有多个值的部分和 1:1 键值对。
尝试:
data = {}
for _, row in df.iterrows():
if row["key"] in {"dep", "country"}:
data.setdefault(row["id"], {})[row["key"]] = row["val"]
else:
data.setdefault(row["id"], {})[row["val"]] = 1
data[row["id"]]["id"] = row["id"]
df = pd.DataFrame(data.values()).fillna(0).set_index("id")
df = df[sorted(df.columns, key=lambda k: k in {"dep", "country"})]
print(df)
打印:
admin local_usr fin_dep_ds kuku dep country
id
0 1.0 1.0 1.0 0.0 fin US
1 0.0 0.0 0.0 1.0 security DE
根据密钥是否重复,你旋转它两次:
# Find keys that are repeated more than once for any `id`
idx = df.groupby(["key", "id"]).size().groupby(level=0).max().loc[lambda x: x > 1].index
# We will pivot those keys differently
cond = df["key"].isin(idx)
result = pd.concat([
df[cond].pivot_table(index="id", columns="val", aggfunc="size", fill_value=0),
df[~cond].pivot_table(index="id", columns="key", aggfunc="first").droplevel(0, axis=1)
], axis=1).reset_index()
这是我尝试旋转的玩具数据框示例:
import pandas as pd
df = pd.DataFrame({'id': [0, 0, 0, 0, 0, 1, 1,1], 'key':['role', 'role', 'role', 'dep', 'country', 'role', 'dep', 'country'], 'val': ['admin', 'local_usr', 'fin_dep_ds', 'fin', 'US', 'kuku', 'security', 'DE']})
df.pivot_table(index="id", columns="val", aggfunc="size", ).reset_index()
但我得到的输出是:
val id DE US admin fin fin_dep_ds kuku local_usr security
0 0 NaN 1.0 1.0 1.0 1.0 NaN 1.0 NaN
1 1 1.0 NaN NaN NaN NaN 1.0 NaN 1.0
我想把它改造成:
id admin local_usr fin_dep_ds kuku country dep
0 1 1 1 0 US fin
1 0 0 0 1 DE security
请告知如何调整我的 df 以获得此结果,在我看来我需要将 df 拆分为 2 个部分并加入 - 每个键具有多个值的部分和 1:1 键值对。
尝试:
data = {}
for _, row in df.iterrows():
if row["key"] in {"dep", "country"}:
data.setdefault(row["id"], {})[row["key"]] = row["val"]
else:
data.setdefault(row["id"], {})[row["val"]] = 1
data[row["id"]]["id"] = row["id"]
df = pd.DataFrame(data.values()).fillna(0).set_index("id")
df = df[sorted(df.columns, key=lambda k: k in {"dep", "country"})]
print(df)
打印:
admin local_usr fin_dep_ds kuku dep country
id
0 1.0 1.0 1.0 0.0 fin US
1 0.0 0.0 0.0 1.0 security DE
根据密钥是否重复,你旋转它两次:
# Find keys that are repeated more than once for any `id`
idx = df.groupby(["key", "id"]).size().groupby(level=0).max().loc[lambda x: x > 1].index
# We will pivot those keys differently
cond = df["key"].isin(idx)
result = pd.concat([
df[cond].pivot_table(index="id", columns="val", aggfunc="size", fill_value=0),
df[~cond].pivot_table(index="id", columns="key", aggfunc="first").droplevel(0, axis=1)
], axis=1).reset_index()