使用掩码和多重映射替换列值

Question

我有两个数据框。一个是 v_df，看起来像这样：

VENDOR_ID	VENDOR_NAME
123	APPLE
456	GOOGLE
987	FACEBOOK

另一个是 n_df，看起来像这样：

Vendor_Name	GL_Transaction_Description
AMEX	HELLO 345
Not assigned	BYE 456
Not assigned	THANKS 123

我想填充 n_df 中的 'Vendor_Name' 列，条件是同一行上的 'GL_Transaction_Description' 包含来自 v_df 的任何 VENDOR_ID 值.所以结果 n_df 将是这样的：

Vendor_Name	GL_Transaction_Description
AMEX	HELLO 345
GOOGLE	BYE 456
APPLE	THANKS 123

到目前为止我已经写了这段代码：

v_list = v_df['VENDOR_ID'].to_list()
mask_id = list(map((lambda x: any([(y in x) for y in v_list])), n_df['GL_Transaction_Description']))

n_df['Vendor_Name'].mask((mask_id), other = 'Solution Here', inplace=True)

我只是无法掌握在最终掩码的 'other' 条件中写什么。有任何想法吗？（n_df行数超过100k，解决方案的执行速度很重要）

Answer 1

`Series.str.extract` + `map`

i = v_df['VENDOR_ID'].astype(str)
m = v_df.set_index(i)['VENDOR_NAME']
s = n_df['GL_Transaction_Description'].str.extract(r'(\d+)', expand=False)

n_df['Vendor_Name'].update(s.map(m))

解释

通过将 VENDOR_ID 列设置为索引并选择 VENDOR_NAME 列

，从 v_df 数据框创建一个映射系列 m

>>> m

VENDOR_ID
123       APPLE
456      GOOGLE
987    FACEBOOK
Name: VENDOR_NAME, dtype: object

现在 extract 来自 GL_Transaction_Description 列中字符串的供应商 ID

>>> s

0    345
1    456
2    123
Name: GL_Transaction_Description, dtype: object

Map 提取的供应商 ID 与映射系列 m 和 update Vendor_Name 列中的映射值

>>> n_df

  Vendor_Name GL_Transaction_Description
0        AMEX                  HELLO 345
1      GOOGLE                    BYE 456
2       APPLE                 THANKS 123

使用掩码和多重映射替换列值

Replace column values using mask and multiple mappings

python

mapping

mask

pandas

`Series.str.extract` + `map`

解释

使用掩码和多重映射替换列值

Replace column values using mask and multiple mappings

python

mapping

mask

pandas

Series.str.extract + map

解释

`Series.str.extract` + `map`