Python Pandas 的合并 (SQL) 功能
Coalesce (SQL) functionality for Python Pandas
全部,
我在 pandas documentation as well as 中找到了一个名为 "combine_first()" 的函数。这仅适用于几个逻辑示例。我能够让下面的代码多次组合 "combine_first()" 函数(在本例中为 6)。有人可以协助找到更优雅的解决方案吗?
创建的变量 "category_id" 的结果应包含从最后一个变量 (category_id7) 开始并升到第一个的第一个非缺失值。如果 category_id(x) 已填充,category_id 应采用该值并停止处理数据框中的每一行。
d={'category_id1':[32991,32991,32991,32991,32991],
'category_id2':[22,22,22,22,22],
'category_id3':[33058,51,121,120,32438],
'category_id4':[np.nan,np.nan,np.nan,np.nan,np.nan],
'category_id5':[np.nan,np.nan,np.nan,np.nan,np.nan],
'category_id6':[np.nan,np.nan,np.nan,np.nan,np.nan],
'category_id7':[np.nan,np.nan,np.nan,np.nan,np.nan]
}
df=pd.DataFrame(data=d)
df['category_id']=df.category_id7.combine_first(df.category_id6).combine_first(df.category_id5).combine_first(df.category_id4).combine_first(df.category_id3).combine_first(df.category_id2).combine_first(df.category_id1)
print(df)
您正在尝试从后面级联。所以我用 iloc
颠倒了列的顺序。我用 pd.DataFrame.notnull()
跟进以确定哪些单元格不为空。当我随后 运行 pd.DataFrame.idxmax
时,我从后面开始查找每一行中第一个非空值的所有列名。最后,我使用 pd.DataFrame.lookup
查找与找到的列关联的值。
df.assign(
category_id=df.iloc[:, ::-1].notnull().idxmax(1).pipe(
lambda d: df.lookup(d.index, d.values)
)
)
category_id1 category_id2 category_id3 category_id4 category_id5 category_id6 category_id7 category_id
0 32991 22 33058 NaN NaN NaN NaN 33058
1 32991 22 51 NaN NaN NaN NaN 51
2 32991 22 121 NaN NaN NaN NaN 121
3 32991 22 120 NaN NaN NaN NaN 120
4 32991 22 32438 NaN NaN NaN NaN 32438
全部,
我在 pandas documentation as well as
创建的变量 "category_id" 的结果应包含从最后一个变量 (category_id7) 开始并升到第一个的第一个非缺失值。如果 category_id(x) 已填充,category_id 应采用该值并停止处理数据框中的每一行。
d={'category_id1':[32991,32991,32991,32991,32991],
'category_id2':[22,22,22,22,22],
'category_id3':[33058,51,121,120,32438],
'category_id4':[np.nan,np.nan,np.nan,np.nan,np.nan],
'category_id5':[np.nan,np.nan,np.nan,np.nan,np.nan],
'category_id6':[np.nan,np.nan,np.nan,np.nan,np.nan],
'category_id7':[np.nan,np.nan,np.nan,np.nan,np.nan]
}
df=pd.DataFrame(data=d)
df['category_id']=df.category_id7.combine_first(df.category_id6).combine_first(df.category_id5).combine_first(df.category_id4).combine_first(df.category_id3).combine_first(df.category_id2).combine_first(df.category_id1)
print(df)
您正在尝试从后面级联。所以我用 iloc
颠倒了列的顺序。我用 pd.DataFrame.notnull()
跟进以确定哪些单元格不为空。当我随后 运行 pd.DataFrame.idxmax
时,我从后面开始查找每一行中第一个非空值的所有列名。最后,我使用 pd.DataFrame.lookup
查找与找到的列关联的值。
df.assign(
category_id=df.iloc[:, ::-1].notnull().idxmax(1).pipe(
lambda d: df.lookup(d.index, d.values)
)
)
category_id1 category_id2 category_id3 category_id4 category_id5 category_id6 category_id7 category_id
0 32991 22 33058 NaN NaN NaN NaN 33058
1 32991 22 51 NaN NaN NaN NaN 51
2 32991 22 121 NaN NaN NaN NaN 121
3 32991 22 120 NaN NaN NaN NaN 120
4 32991 22 32438 NaN NaN NaN NaN 32438