屏蔽并连接 pandas 中的两个字符串系列
mask and concatenate two string series in pandas
我正在努力让它发挥作用。我屏蔽掉 NaN,然后将这两个系列连接起来。这样用 +
拼接是不是不行?如果不是,最好的方法是什么?
row_mask = df[df[win_end].notnull()]
df.loc[row_mask, df['window_full']] = df.loc[row_mask, win_start]+' - '+df.loc[row_mask, win_end]
print(df['window_start', 'window_end', 'window_full'])
输入
win_start win_end
2021-12-16 2021-12-18
2022-01-19 NaN
2022-01-18 2022-01-20
预期输出
win_start win_end window_full
2021-12-16 2021-12-18 2021-12-16 - 2021-12-18
2022-01-19 NaN NaN
2022-01-18 2022-01-20 2022-01-18 - 2022-01-20
错误
ValueError: Cannot index with multidimensional key
编辑
感谢您的快速回复。所以我尝试使用 'window_full'
但仍然出现错误:-
ValueError: Cannot index with multidimensional key
编辑 2
工作解决方案需要传递布尔掩码:-
row_mask = df['win_end'].notnull()
df.loc[row_mask, 'window_full'] = (df.loc[row_mask, 'win_start']+' - '+df.loc[row_mask, 'win_end'])
对于分配新列使用列名而不是系列 df['window_full']
,与选择等语法相同,也有必要通过布尔掩码而不是过滤 DataFrame row_mask = df[df[win_end].notnull()]
:
row_mask = df['win_end'].notnull()
df.loc[row_mask, 'window_full'] = (df.loc[row_mask, 'win_start']+' - '+
df.loc[row_mask, 'win_end'])
或测试两列:
row_mask = df[['win_end', win_start]].notnull().any(axis=1)
df.loc[row_mask, 'window_full'] = (df.loc[row_mask, 'win_start']+' - '+
df.loc[row_mask, 'win_end'])
我正在努力让它发挥作用。我屏蔽掉 NaN,然后将这两个系列连接起来。这样用 +
拼接是不是不行?如果不是,最好的方法是什么?
row_mask = df[df[win_end].notnull()]
df.loc[row_mask, df['window_full']] = df.loc[row_mask, win_start]+' - '+df.loc[row_mask, win_end]
print(df['window_start', 'window_end', 'window_full'])
输入
win_start win_end
2021-12-16 2021-12-18
2022-01-19 NaN
2022-01-18 2022-01-20
预期输出
win_start win_end window_full
2021-12-16 2021-12-18 2021-12-16 - 2021-12-18
2022-01-19 NaN NaN
2022-01-18 2022-01-20 2022-01-18 - 2022-01-20
错误
ValueError: Cannot index with multidimensional key
编辑
感谢您的快速回复。所以我尝试使用 'window_full'
但仍然出现错误:-
ValueError: Cannot index with multidimensional key
编辑 2 工作解决方案需要传递布尔掩码:-
row_mask = df['win_end'].notnull()
df.loc[row_mask, 'window_full'] = (df.loc[row_mask, 'win_start']+' - '+df.loc[row_mask, 'win_end'])
对于分配新列使用列名而不是系列 df['window_full']
,与选择等语法相同,也有必要通过布尔掩码而不是过滤 DataFrame row_mask = df[df[win_end].notnull()]
:
row_mask = df['win_end'].notnull()
df.loc[row_mask, 'window_full'] = (df.loc[row_mask, 'win_start']+' - '+
df.loc[row_mask, 'win_end'])
或测试两列:
row_mask = df[['win_end', win_start]].notnull().any(axis=1)
df.loc[row_mask, 'window_full'] = (df.loc[row_mask, 'win_start']+' - '+
df.loc[row_mask, 'win_end'])