基于 1d numpy 数组或列表替换 2d numpy 数组中的值
Replacing values in 2d numpy array based on 1d numpy array or list
考虑以下二维 numPy 数组:
import numpy as np
daily = np.array([
['2022-01-01', 'AccountName1', 123456789, 'campaignname1', 111, 100, 1.1, 'group'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname2', 222, 200, 2.2, 'group'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname3', 333, 300, 3.3, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname1', 111, 400, 4.4, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname2', 222, 500, 5.5, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname3', 333, 600, 6.6, 'group'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname1', 111, 700, 7.7, 'group'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname2', 222, 800, 8.8, 'group'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname3', 333, 900, 9.9, 'group'],
], dtype = object)
daily
这是另一个 1d numPy 数组(如果需要可以列出):
campaigns = np.array([111, 333], dtype = object)
campaigns
根据活动中的值是否存在,将最后一列值从 'group' 替换为 'new' 或 'old' 的最快方法是什么?我能够使用 python for loop + if 语句实现它的方式对于最终目标来说非常慢。最后一步是检查 new/old 的数十亿个组合,因此我们需要非常快的东西。
%%time
for x in daily:
if x[4] in campaigns:
x[7] = 'new'
else:
x[7] = 'old'
daily
这是预期的结果:
result = np.array([
['2022-01-01', 'AccountName1', 123456789, 'campaignname1', 111, 100, 1.1, 'new'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname2', 222, 200, 2.2, 'old'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname3', 333, 300, 3.3, 'new'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname1', 111, 400, 4.4, 'new'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname2', 222, 500, 5.5, 'old'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname3', 333, 600, 6.6, 'new'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname1', 111, 700, 7.7, 'new'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname2', 222, 800, 8.8, 'old'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname3', 333, 900, 9.9, 'new']
], dtype=object)
result
整4栏:
In [58]: daily[:,4]
Out[58]: array([111, 222, 333, 111, 222, 333, 111, 222, 333], dtype=object)
我们可以将它与 campaigns
匹配为:
In [60]: np.in1d(daily[:,4],campaigns)
Out[60]: array([ True, False, True, True, False, True, True, False, True])
In [62]: mask = np.in1d(daily[:,4],campaigns)
In [63]: daily[mask,7]
Out[63]: array(['group', 'group', 'group', 'group', 'group', 'group'], dtype=object)
where
让我们将其转换为字符串数组:
In [67]: np.where(mask, 'new','old')
Out[67]:
array(['new', 'old', 'new', 'new', 'old', 'new', 'new', 'old', 'new'],
dtype='<U3')
我们可以将其分配给第 7 列:
In [68]: daily[:,7] = _
我看到很多 pandas
关于以相同方式使用 np.where
的问题。
考虑以下二维 numPy 数组:
import numpy as np
daily = np.array([
['2022-01-01', 'AccountName1', 123456789, 'campaignname1', 111, 100, 1.1, 'group'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname2', 222, 200, 2.2, 'group'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname3', 333, 300, 3.3, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname1', 111, 400, 4.4, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname2', 222, 500, 5.5, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname3', 333, 600, 6.6, 'group'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname1', 111, 700, 7.7, 'group'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname2', 222, 800, 8.8, 'group'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname3', 333, 900, 9.9, 'group'],
], dtype = object)
daily
这是另一个 1d numPy 数组(如果需要可以列出):
campaigns = np.array([111, 333], dtype = object)
campaigns
根据活动中的值是否存在,将最后一列值从 'group' 替换为 'new' 或 'old' 的最快方法是什么?我能够使用 python for loop + if 语句实现它的方式对于最终目标来说非常慢。最后一步是检查 new/old 的数十亿个组合,因此我们需要非常快的东西。
%%time
for x in daily:
if x[4] in campaigns:
x[7] = 'new'
else:
x[7] = 'old'
daily
这是预期的结果:
result = np.array([
['2022-01-01', 'AccountName1', 123456789, 'campaignname1', 111, 100, 1.1, 'new'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname2', 222, 200, 2.2, 'old'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname3', 333, 300, 3.3, 'new'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname1', 111, 400, 4.4, 'new'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname2', 222, 500, 5.5, 'old'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname3', 333, 600, 6.6, 'new'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname1', 111, 700, 7.7, 'new'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname2', 222, 800, 8.8, 'old'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname3', 333, 900, 9.9, 'new']
], dtype=object)
result
整4栏:
In [58]: daily[:,4]
Out[58]: array([111, 222, 333, 111, 222, 333, 111, 222, 333], dtype=object)
我们可以将它与 campaigns
匹配为:
In [60]: np.in1d(daily[:,4],campaigns)
Out[60]: array([ True, False, True, True, False, True, True, False, True])
In [62]: mask = np.in1d(daily[:,4],campaigns)
In [63]: daily[mask,7]
Out[63]: array(['group', 'group', 'group', 'group', 'group', 'group'], dtype=object)
where
让我们将其转换为字符串数组:
In [67]: np.where(mask, 'new','old')
Out[67]:
array(['new', 'old', 'new', 'new', 'old', 'new', 'new', 'old', 'new'],
dtype='<U3')
我们可以将其分配给第 7 列:
In [68]: daily[:,7] = _
我看到很多 pandas
关于以相同方式使用 np.where
的问题。