基于 1d numpy 数组或列表替换 2d numpy 数组中的值

Replacing values in 2d numpy array based on 1d numpy array or list

考虑以下二维 numPy 数组:

import numpy as np

daily = np.array([
['2022-01-01', 'AccountName1', 123456789, 'campaignname1', 111, 100, 1.1, 'group'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname2', 222, 200, 2.2, 'group'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname3', 333, 300, 3.3, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname1', 111, 400, 4.4, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname2', 222, 500, 5.5, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname3', 333, 600, 6.6, 'group'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname1', 111, 700, 7.7, 'group'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname2', 222, 800, 8.8, 'group'],  
['2022-01-03', 'AccountName1', 123456789, 'campaignname3', 333, 900, 9.9, 'group'],
], dtype = object)

daily

这是另一个 1d numPy 数组(如果需要可以列出):

campaigns = np.array([111, 333], dtype = object)
campaigns

根据活动中的值是否存在,将最后一列值从 'group' 替换为 'new' 或 'old' 的最快方法是什么?我能够使用 python for loop + if 语句实现它的方式对于最终目标来说非常慢。最后一步是检查 new/old 的数十亿个组合,因此我们需要非常快的东西。

%%time
for x in daily:
    if x[4] in campaigns:
        x[7] = 'new'
    else:
        x[7] = 'old'
daily

这是预期的结果:

result = np.array([
['2022-01-01', 'AccountName1', 123456789, 'campaignname1', 111, 100, 1.1, 'new'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname2', 222, 200, 2.2, 'old'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname3', 333, 300, 3.3, 'new'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname1', 111, 400, 4.4, 'new'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname2', 222, 500, 5.5, 'old'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname3', 333, 600, 6.6, 'new'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname1', 111, 700, 7.7, 'new'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname2', 222, 800, 8.8, 'old'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname3', 333, 900, 9.9, 'new']
], dtype=object)

result

整4栏:

In [58]: daily[:,4]
Out[58]: array([111, 222, 333, 111, 222, 333, 111, 222, 333], dtype=object)

我们可以将它与 campaigns 匹配为:

In [60]: np.in1d(daily[:,4],campaigns)
Out[60]: array([ True, False,  True,  True, False,  True,  True, False,  True])

In [62]: mask = np.in1d(daily[:,4],campaigns)

In [63]: daily[mask,7]
Out[63]: array(['group', 'group', 'group', 'group', 'group', 'group'], dtype=object)

where 让我们将其转换为字符串数组:

In [67]: np.where(mask, 'new','old')
Out[67]: 
array(['new', 'old', 'new', 'new', 'old', 'new', 'new', 'old', 'new'],
      dtype='<U3')

我们可以将其分配给第 7 列:

In [68]: daily[:,7] = _

我看到很多 pandas 关于以相同方式使用 np.where 的问题。