将数据框列的值更改为第二列的值,条件是 pandas 中第三列的值

Change the value of a dataframe column to the value of a second column conditional on the value of a third column in pandas

我有公司当前名称、旧名称和名称更改日期的数据。它看起来像这样:

name former_name1 name_change_date1
ACMAT CORP nan NaT
ACME ELECTRIC CORP nan NaT
ACME UNITED CORP nan NaT
COLUMBIA ACORN TRUST LIBERTY ACORN TRUST 2003-10-20
MULTIGRAPHICS INC AM INTERNATIONAL INC 1997-03-17
MILLER LLOYD I III nan NaT
AFFILIATED COMPUTER SERVICES INC nan NaT
ADAMS RESOURCES & ENERGY, INC. ADAMS RESOURCES & ENERGY INC 2005-04-01
BK Technologies Corp BK Technologies, Inc. 2019-03-28

我想知道每个公司在特定日期的名称。假设我想找出一家公司截至 2002 年 1 月 1 日的名称。然后我可以创建一个名为 say edited_name 的新列,其中将包含公司的当前名称 除非 公司自 2002 年 1 月 1 日起更改了名称,在这种情况下,它将包含公司的历史名称(即 former_name1)。所以输出应该是这样的:

name former_name1 name_change_date1 edited_name
ACMAT CORP nan NaT ACMAT CORP
ACME ELECTRIC CORP nan NaT ACME ELECTRIC CORP
ACME UNITED CORP nan NaT ACME UNITED CORP
COLUMBIA ACORN TRUST LIBERTY ACORN TRUST 2003-10-20 LIBERTY ACORN TRUST
MULTIGRAPHICS INC AM INTERNATIONAL INC 1997-03-17 MULTIGRAPHICS INC
MILLER LLOYD I III nan NaT MILLER LLOYD I III
AFFILIATED COMPUTER SERVICES INC nan NaT AFFILIATED COMPUTER SERVICES INC
ADAMS RESOURCES & ENERGY, INC. ADAMS RESOURCES & ENERGY INC 2005-04-01 ADAMS RESOURCES & ENERGY INC
BK Technologies Corp BK Technologies, Inc. 2019-03-28 BK Technologies, Inc.

在 Stata(我更熟悉它)中,这可以很容易地完成:

gen edited_name = name
replace edited_name = former_name1 if name_change_date_1 > date("2002-01-01", "YMD") & name_change_date_1 != .

不幸的是,我不知道如何在 Python/Pandas 中完成此操作。

数据:

{'name': ['ACMAT CORP', 'ACME ELECTRIC CORP', 'ACME UNITED CORP', 'COLUMBIA ACORN TRUST',
          'MULTIGRAPHICS INC', 'MILLER LLOYD I III', 'AFFILIATED COMPUTER SERVICES INC',
          'ADAMS RESOURCES & ENERGY, INC.', 'BK Technologies Corp'],
 'former_name1': [nan, nan, nan, 'LIBERTY ACORN TRUST', 'AM INTERNATIONAL INC', nan, nan,
                  'ADAMS RESOURCES & ENERGY INC', 'BK Technologies, Inc.'],
 'name_change_date1': [NaT, NaT, NaT, '2003-10-20', '1997-03-17', NaT, NaT,
                       '2005-04-01', '2019-03-28']}

您可以使用 numpy.where 到 select 值,具体取决于是否发生名称更改:

import numpy as np
df['edited_name'] = np.where(df['name_change_date1'].notna() & 
                             df['name_change_date1'].gt(pd.to_datetime('1/1/2002')), 
                             df['former_name1'], df['name'])

mask:

df['edited_name'] = df['name'].mask(df['name_change_date1'].notna() & 
                                    df['name_change_date1'].gt(pd.to_datetime('1/1/2002')), 
                                    df['former_name1'])

输出:

                               name                  former_name1  \
0                        ACMAT CORP                           NaN   
1                ACME ELECTRIC CORP                           NaN   
2                  ACME UNITED CORP                           NaN   
3              COLUMBIA ACORN TRUST           LIBERTY ACORN TRUST   
4                 MULTIGRAPHICS INC          AM INTERNATIONAL INC   
5                MILLER LLOYD I III                           NaN   
6  AFFILIATED COMPUTER SERVICES INC                           NaN   
7    ADAMS RESOURCES & ENERGY, INC.  ADAMS RESOURCES & ENERGY INC   
8              BK Technologies Corp         BK Technologies, Inc.   

  name_change_date1                       edited_name  
0               NaT                        ACMAT CORP  
1               NaT                ACME ELECTRIC CORP  
2               NaT                  ACME UNITED CORP  
3        2003-10-20               LIBERTY ACORN TRUST  
4        1997-03-17                 MULTIGRAPHICS INC  
5               NaT                MILLER LLOYD I III  
6               NaT  AFFILIATED COMPUTER SERVICES INC  
7        2005-04-01      ADAMS RESOURCES & ENERGY INC  
8        2019-03-28             BK Technologies, Inc.  

使用:

import numpy as np
df = pd.DataFrame({'name':['a', 'b', 'c', 'd'], 'fname':[np.nan, 'h', 's', np.nan], 'dc':[np.nan, '2003-10-20', '1997-03-17', np.nan]})
df['dc'] = pd.to_datetime(df['dc'])
df['nname'] = df['fname'][df['dc']>'1/1/2002']
res = df['name'][df['nname'].isna()]
temp = df['fname'][df['nname'].notna()]
res = res.append(temp)
df['res']=res

输出: