在 python 中合并数据框中的两列时出现问题
Problem on merging two columns in a dataframe in python
我正在尝试合并 python 中 dataframe
中的两列。原来的dataframe
是这样的:
type id details details2
0 hotel df9466 #2 in the rank of 288 nan
1 hotel gt9444 #48 in the rank of 340 nan
2 hotel dfa887 #12 in the rank of 7414 nan
3 hotel fgfd81 nan #1 in rank of 8792
4 hotel fsf887 nan #70 in rank of 245
而我的预期结果应该是这样的:
type id details
0 hotel df9466 #2 in the rank of 288
1 hotel gt9444 #48 in the rank of 340
2 hotel dfa887 #12 in the rank of 7414
3 hotel fgfd81 #1 in the rank of 8792
4 hotel fsf887 #70 in the rank of 245
在我的编码中,我试图将它与
合并
df_hotel["details"] = (df_hotel["details"] + df_hotel["details2"])
但是,它失败了,它给出了一个结果,其中包含“详细信息”列中的所有 nan 值。
尝试:
replace()
用于替换字符串'nan'
(如果有如果'nan'是实际的NaN那么你可以跳过这一步直接运行 fillna()
)到实际 NaN
和 fillna()
来填充那些 NaN 的
df_hotel= df_hotel.replace('nan',float('NaN'),regex=True)
df_hotel["details"]=df_hotel["details"].fillna(df_hotel.pop("details2"))
df_hotel
的输出:
type id details
0 hotel df9466 #2 in rank of 288
1 hotel gt9444 #48 in rank of 340
2 hotel dfa887 #12 in rank of 7414
3 hotel fgfd81 #1 in rank of 8792
4 hotel fsf887 #70 in rank of 245
NaN
加任何东西都会是 NaN
。相反,我们可以使用 Series.add
并将 fill_value 设置为空字符串。
df_hotel['details'] = (
df_hotel["details"].add(df_hotel["details2"], fill_value='')
)
或者我们可以 Series.fillna
两个系列并添加 +
:
df_hotel["details"] = (df_hotel["details"].fillna('') +
df_hotel["details2"].fillna(''))
df_hotel
:
type id details details2
0 hotel df9466 #2 in rank of 288 NaN
1 hotel gt9444 #48 in rank of 340 NaN
2 hotel dfa887 #12 in rank of 7414 NaN
3 hotel fgfd81 #1 in rank of 8792 #1 in rank of 8792
4 hotel fsf887 #70 in rank of 245 #70 in rank of 245
我们可以 pop
details2
如果我们想从 DataFrame 中删除:
df_hotel['details'] = (
df_hotel["details"].add(df_hotel.pop("details2"), fill_value='')
)
或
df_hotel["details"] = (df_hotel["details"].fillna('') +
df_hotel.pop("details2").fillna(''))
df_hotel
:
type id details
0 hotel df9466 #2 in rank of 288
1 hotel gt9444 #48 in rank of 340
2 hotel dfa887 #12 in rank of 7414
3 hotel fgfd81 #1 in rank of 8792
4 hotel fsf887 #70 in rank of 245
我正在尝试合并 python 中 dataframe
中的两列。原来的dataframe
是这样的:
type id details details2
0 hotel df9466 #2 in the rank of 288 nan
1 hotel gt9444 #48 in the rank of 340 nan
2 hotel dfa887 #12 in the rank of 7414 nan
3 hotel fgfd81 nan #1 in rank of 8792
4 hotel fsf887 nan #70 in rank of 245
而我的预期结果应该是这样的:
type id details
0 hotel df9466 #2 in the rank of 288
1 hotel gt9444 #48 in the rank of 340
2 hotel dfa887 #12 in the rank of 7414
3 hotel fgfd81 #1 in the rank of 8792
4 hotel fsf887 #70 in the rank of 245
在我的编码中,我试图将它与
合并df_hotel["details"] = (df_hotel["details"] + df_hotel["details2"])
但是,它失败了,它给出了一个结果,其中包含“详细信息”列中的所有 nan 值。
尝试:
replace()
用于替换字符串'nan'
(如果有如果'nan'是实际的NaN那么你可以跳过这一步直接运行 fillna()
)到实际 NaN
和 fillna()
来填充那些 NaN 的
df_hotel= df_hotel.replace('nan',float('NaN'),regex=True)
df_hotel["details"]=df_hotel["details"].fillna(df_hotel.pop("details2"))
df_hotel
的输出:
type id details
0 hotel df9466 #2 in rank of 288
1 hotel gt9444 #48 in rank of 340
2 hotel dfa887 #12 in rank of 7414
3 hotel fgfd81 #1 in rank of 8792
4 hotel fsf887 #70 in rank of 245
NaN
加任何东西都会是 NaN
。相反,我们可以使用 Series.add
并将 fill_value 设置为空字符串。
df_hotel['details'] = (
df_hotel["details"].add(df_hotel["details2"], fill_value='')
)
或者我们可以 Series.fillna
两个系列并添加 +
:
df_hotel["details"] = (df_hotel["details"].fillna('') +
df_hotel["details2"].fillna(''))
df_hotel
:
type id details details2
0 hotel df9466 #2 in rank of 288 NaN
1 hotel gt9444 #48 in rank of 340 NaN
2 hotel dfa887 #12 in rank of 7414 NaN
3 hotel fgfd81 #1 in rank of 8792 #1 in rank of 8792
4 hotel fsf887 #70 in rank of 245 #70 in rank of 245
我们可以 pop
details2
如果我们想从 DataFrame 中删除:
df_hotel['details'] = (
df_hotel["details"].add(df_hotel.pop("details2"), fill_value='')
)
或
df_hotel["details"] = (df_hotel["details"].fillna('') +
df_hotel.pop("details2").fillna(''))
df_hotel
:
type id details
0 hotel df9466 #2 in rank of 288
1 hotel gt9444 #48 in rank of 340
2 hotel dfa887 #12 in rank of 7414
3 hotel fgfd81 #1 in rank of 8792
4 hotel fsf887 #70 in rank of 245