从 Pandas 数据框中提取 Frozenset 项目
Extract Frozenset items from Pandas Dataframe
我有以下数据框:
我想将 "antecedents" 和 "consequents" 列转换为字符串,删除 "frozenset({ ... })" 格式,因此对于所有行:
"VENTOLIN S.INAL200D 100MCG",而不是 frozenset({ "VENTOLIN S.INAL200D 100MCG" }).
我设法实现了结果:
prod = []
for i in df["antecedents"]:
prod.append(str(i))
new_set = {x.replace('frozenset', ''
).replace('})', ''
).replace('({', ''
).replace("'", "") for x in prod}
是否有更pythonic的解决方案?
首先将值转换为元组或列表,然后使用 DataFrame.explode
:
df = pd.DataFrame({
'antecedents':[frozenset({'aaa', 'bbb'})] * 3 + [frozenset({'nbb'})] * 3,
'consequents':[frozenset({'ccc'})] * 3 + [frozenset({'nbb', 'ddd'})] * 3,
'C':[1,3,5,7,1,0],
})
#print (df)
cols = ['antecedents','consequents']
df[cols] = df[cols].applymap(lambda x: tuple(x))
print (df)
antecedents consequents C
0 (bbb, aaa) (ccc,) 1
1 (bbb, aaa) (ccc,) 3
2 (bbb, aaa) (ccc,) 5
3 (nbb,) (nbb, ddd) 7
4 (nbb,) (nbb, ddd) 1
5 (nbb,) (nbb, ddd) 0
df1 = (df.explode('antecedents')
.reset_index(drop=True)
.explode('consequents')
.reset_index(drop=True))
print (df1)
antecedents consequents C
0 bbb ccc 1
1 aaa ccc 1
2 bbb ccc 3
3 aaa ccc 3
4 bbb ccc 5
5 aaa ccc 5
6 nbb nbb 7
7 nbb ddd 7
8 nbb nbb 1
9 nbb ddd 1
10 nbb nbb 0
11 nbb ddd 0
我有以下数据框:
我想将 "antecedents" 和 "consequents" 列转换为字符串,删除 "frozenset({ ... })" 格式,因此对于所有行:
"VENTOLIN S.INAL200D 100MCG",而不是 frozenset({ "VENTOLIN S.INAL200D 100MCG" }).
我设法实现了结果:
prod = []
for i in df["antecedents"]:
prod.append(str(i))
new_set = {x.replace('frozenset', ''
).replace('})', ''
).replace('({', ''
).replace("'", "") for x in prod}
是否有更pythonic的解决方案?
首先将值转换为元组或列表,然后使用 DataFrame.explode
:
df = pd.DataFrame({
'antecedents':[frozenset({'aaa', 'bbb'})] * 3 + [frozenset({'nbb'})] * 3,
'consequents':[frozenset({'ccc'})] * 3 + [frozenset({'nbb', 'ddd'})] * 3,
'C':[1,3,5,7,1,0],
})
#print (df)
cols = ['antecedents','consequents']
df[cols] = df[cols].applymap(lambda x: tuple(x))
print (df)
antecedents consequents C
0 (bbb, aaa) (ccc,) 1
1 (bbb, aaa) (ccc,) 3
2 (bbb, aaa) (ccc,) 5
3 (nbb,) (nbb, ddd) 7
4 (nbb,) (nbb, ddd) 1
5 (nbb,) (nbb, ddd) 0
df1 = (df.explode('antecedents')
.reset_index(drop=True)
.explode('consequents')
.reset_index(drop=True))
print (df1)
antecedents consequents C
0 bbb ccc 1
1 aaa ccc 1
2 bbb ccc 3
3 aaa ccc 3
4 bbb ccc 5
5 aaa ccc 5
6 nbb nbb 7
7 nbb ddd 7
8 nbb nbb 1
9 nbb ddd 1
10 nbb nbb 0
11 nbb ddd 0