将数据框列的值更改为第二列的值,条件是 pandas 中第三列的值
Change the value of a dataframe column to the value of a second column conditional on the value of a third column in pandas
我有公司当前名称、旧名称和名称更改日期的数据。它看起来像这样:
name
former_name1
name_change_date1
ACMAT CORP
nan
NaT
ACME ELECTRIC CORP
nan
NaT
ACME UNITED CORP
nan
NaT
COLUMBIA ACORN TRUST
LIBERTY ACORN TRUST
2003-10-20
MULTIGRAPHICS INC
AM INTERNATIONAL INC
1997-03-17
MILLER LLOYD I III
nan
NaT
AFFILIATED COMPUTER SERVICES INC
nan
NaT
ADAMS RESOURCES & ENERGY, INC.
ADAMS RESOURCES & ENERGY INC
2005-04-01
BK Technologies Corp
BK Technologies, Inc.
2019-03-28
我想知道每个公司在特定日期的名称。假设我想找出一家公司截至 2002 年 1 月 1 日的名称。然后我可以创建一个名为 say edited_name
的新列,其中将包含公司的当前名称 除非 公司自 2002 年 1 月 1 日起更改了名称,在这种情况下,它将包含公司的历史名称(即 former_name1
)。所以输出应该是这样的:
name
former_name1
name_change_date1
edited_name
ACMAT CORP
nan
NaT
ACMAT CORP
ACME ELECTRIC CORP
nan
NaT
ACME ELECTRIC CORP
ACME UNITED CORP
nan
NaT
ACME UNITED CORP
COLUMBIA ACORN TRUST
LIBERTY ACORN TRUST
2003-10-20
LIBERTY ACORN TRUST
MULTIGRAPHICS INC
AM INTERNATIONAL INC
1997-03-17
MULTIGRAPHICS INC
MILLER LLOYD I III
nan
NaT
MILLER LLOYD I III
AFFILIATED COMPUTER SERVICES INC
nan
NaT
AFFILIATED COMPUTER SERVICES INC
ADAMS RESOURCES & ENERGY, INC.
ADAMS RESOURCES & ENERGY INC
2005-04-01
ADAMS RESOURCES & ENERGY INC
BK Technologies Corp
BK Technologies, Inc.
2019-03-28
BK Technologies, Inc.
在 Stata(我更熟悉它)中,这可以很容易地完成:
gen edited_name = name
replace edited_name = former_name1 if name_change_date_1 > date("2002-01-01", "YMD") & name_change_date_1 != .
不幸的是,我不知道如何在 Python/Pandas 中完成此操作。
数据:
{'name': ['ACMAT CORP', 'ACME ELECTRIC CORP', 'ACME UNITED CORP', 'COLUMBIA ACORN TRUST',
'MULTIGRAPHICS INC', 'MILLER LLOYD I III', 'AFFILIATED COMPUTER SERVICES INC',
'ADAMS RESOURCES & ENERGY, INC.', 'BK Technologies Corp'],
'former_name1': [nan, nan, nan, 'LIBERTY ACORN TRUST', 'AM INTERNATIONAL INC', nan, nan,
'ADAMS RESOURCES & ENERGY INC', 'BK Technologies, Inc.'],
'name_change_date1': [NaT, NaT, NaT, '2003-10-20', '1997-03-17', NaT, NaT,
'2005-04-01', '2019-03-28']}
您可以使用 numpy.where
到 select 值,具体取决于是否发生名称更改:
import numpy as np
df['edited_name'] = np.where(df['name_change_date1'].notna() &
df['name_change_date1'].gt(pd.to_datetime('1/1/2002')),
df['former_name1'], df['name'])
或 mask
:
df['edited_name'] = df['name'].mask(df['name_change_date1'].notna() &
df['name_change_date1'].gt(pd.to_datetime('1/1/2002')),
df['former_name1'])
输出:
name former_name1 \
0 ACMAT CORP NaN
1 ACME ELECTRIC CORP NaN
2 ACME UNITED CORP NaN
3 COLUMBIA ACORN TRUST LIBERTY ACORN TRUST
4 MULTIGRAPHICS INC AM INTERNATIONAL INC
5 MILLER LLOYD I III NaN
6 AFFILIATED COMPUTER SERVICES INC NaN
7 ADAMS RESOURCES & ENERGY, INC. ADAMS RESOURCES & ENERGY INC
8 BK Technologies Corp BK Technologies, Inc.
name_change_date1 edited_name
0 NaT ACMAT CORP
1 NaT ACME ELECTRIC CORP
2 NaT ACME UNITED CORP
3 2003-10-20 LIBERTY ACORN TRUST
4 1997-03-17 MULTIGRAPHICS INC
5 NaT MILLER LLOYD I III
6 NaT AFFILIATED COMPUTER SERVICES INC
7 2005-04-01 ADAMS RESOURCES & ENERGY INC
8 2019-03-28 BK Technologies, Inc.
使用:
import numpy as np
df = pd.DataFrame({'name':['a', 'b', 'c', 'd'], 'fname':[np.nan, 'h', 's', np.nan], 'dc':[np.nan, '2003-10-20', '1997-03-17', np.nan]})
df['dc'] = pd.to_datetime(df['dc'])
df['nname'] = df['fname'][df['dc']>'1/1/2002']
res = df['name'][df['nname'].isna()]
temp = df['fname'][df['nname'].notna()]
res = res.append(temp)
df['res']=res
输出:
我有公司当前名称、旧名称和名称更改日期的数据。它看起来像这样:
name | former_name1 | name_change_date1 |
---|---|---|
ACMAT CORP | nan | NaT |
ACME ELECTRIC CORP | nan | NaT |
ACME UNITED CORP | nan | NaT |
COLUMBIA ACORN TRUST | LIBERTY ACORN TRUST | 2003-10-20 |
MULTIGRAPHICS INC | AM INTERNATIONAL INC | 1997-03-17 |
MILLER LLOYD I III | nan | NaT |
AFFILIATED COMPUTER SERVICES INC | nan | NaT |
ADAMS RESOURCES & ENERGY, INC. | ADAMS RESOURCES & ENERGY INC | 2005-04-01 |
BK Technologies Corp | BK Technologies, Inc. | 2019-03-28 |
我想知道每个公司在特定日期的名称。假设我想找出一家公司截至 2002 年 1 月 1 日的名称。然后我可以创建一个名为 say edited_name
的新列,其中将包含公司的当前名称 除非 公司自 2002 年 1 月 1 日起更改了名称,在这种情况下,它将包含公司的历史名称(即 former_name1
)。所以输出应该是这样的:
name | former_name1 | name_change_date1 | edited_name |
---|---|---|---|
ACMAT CORP | nan | NaT | ACMAT CORP |
ACME ELECTRIC CORP | nan | NaT | ACME ELECTRIC CORP |
ACME UNITED CORP | nan | NaT | ACME UNITED CORP |
COLUMBIA ACORN TRUST | LIBERTY ACORN TRUST | 2003-10-20 | LIBERTY ACORN TRUST |
MULTIGRAPHICS INC | AM INTERNATIONAL INC | 1997-03-17 | MULTIGRAPHICS INC |
MILLER LLOYD I III | nan | NaT | MILLER LLOYD I III |
AFFILIATED COMPUTER SERVICES INC | nan | NaT | AFFILIATED COMPUTER SERVICES INC |
ADAMS RESOURCES & ENERGY, INC. | ADAMS RESOURCES & ENERGY INC | 2005-04-01 | ADAMS RESOURCES & ENERGY INC |
BK Technologies Corp | BK Technologies, Inc. | 2019-03-28 | BK Technologies, Inc. |
在 Stata(我更熟悉它)中,这可以很容易地完成:
gen edited_name = name
replace edited_name = former_name1 if name_change_date_1 > date("2002-01-01", "YMD") & name_change_date_1 != .
不幸的是,我不知道如何在 Python/Pandas 中完成此操作。
数据:
{'name': ['ACMAT CORP', 'ACME ELECTRIC CORP', 'ACME UNITED CORP', 'COLUMBIA ACORN TRUST',
'MULTIGRAPHICS INC', 'MILLER LLOYD I III', 'AFFILIATED COMPUTER SERVICES INC',
'ADAMS RESOURCES & ENERGY, INC.', 'BK Technologies Corp'],
'former_name1': [nan, nan, nan, 'LIBERTY ACORN TRUST', 'AM INTERNATIONAL INC', nan, nan,
'ADAMS RESOURCES & ENERGY INC', 'BK Technologies, Inc.'],
'name_change_date1': [NaT, NaT, NaT, '2003-10-20', '1997-03-17', NaT, NaT,
'2005-04-01', '2019-03-28']}
您可以使用 numpy.where
到 select 值,具体取决于是否发生名称更改:
import numpy as np
df['edited_name'] = np.where(df['name_change_date1'].notna() &
df['name_change_date1'].gt(pd.to_datetime('1/1/2002')),
df['former_name1'], df['name'])
或 mask
:
df['edited_name'] = df['name'].mask(df['name_change_date1'].notna() &
df['name_change_date1'].gt(pd.to_datetime('1/1/2002')),
df['former_name1'])
输出:
name former_name1 \
0 ACMAT CORP NaN
1 ACME ELECTRIC CORP NaN
2 ACME UNITED CORP NaN
3 COLUMBIA ACORN TRUST LIBERTY ACORN TRUST
4 MULTIGRAPHICS INC AM INTERNATIONAL INC
5 MILLER LLOYD I III NaN
6 AFFILIATED COMPUTER SERVICES INC NaN
7 ADAMS RESOURCES & ENERGY, INC. ADAMS RESOURCES & ENERGY INC
8 BK Technologies Corp BK Technologies, Inc.
name_change_date1 edited_name
0 NaT ACMAT CORP
1 NaT ACME ELECTRIC CORP
2 NaT ACME UNITED CORP
3 2003-10-20 LIBERTY ACORN TRUST
4 1997-03-17 MULTIGRAPHICS INC
5 NaT MILLER LLOYD I III
6 NaT AFFILIATED COMPUTER SERVICES INC
7 2005-04-01 ADAMS RESOURCES & ENERGY INC
8 2019-03-28 BK Technologies, Inc.
使用:
import numpy as np
df = pd.DataFrame({'name':['a', 'b', 'c', 'd'], 'fname':[np.nan, 'h', 's', np.nan], 'dc':[np.nan, '2003-10-20', '1997-03-17', np.nan]})
df['dc'] = pd.to_datetime(df['dc'])
df['nname'] = df['fname'][df['dc']>'1/1/2002']
res = df['name'][df['nname'].isna()]
temp = df['fname'][df['nname'].notna()]
res = res.append(temp)
df['res']=res
输出: