想知道如何通过考虑两列在 python 数据框中使用 startswith
want to know how to use startswith in python dataframe by considering two columns
我的输入是(A,B有数据,需要在C列输出)
逻辑是。如果我在 A 和 B 列中的前四个字母是 INKA 或 IDKA,那么输出是 KAR。
同样的方式 INAP 或 IDAP 然后 AP 和 INRJ 或 IDRJ 然后 RAJ
输入为A和B,期望输出为C。
A
B
C
IDKA106829_KMGL_H_Z_8121
INKA100345_KMGL_H_Z_8251
KAR
IDKA101971_KUDU_H_Z_8251
YEDTHADY-IND
KAR
SIRA_RPTR
IDKA102853_KSIR_H_Z_8251
KAR
IDAP104327_PEDA_H_Z_8251
IDAP104769_URUM_H_Z_8251
AP
IDAP103547_RAMP_H_Z_8251
MADDIRALA
AP
SALURU
IDAP103620_SALU_H_Z_8251
AP
IDRJ103411_KOTA_H_Z_8251
KOT009
RAJ
IDRJ100041_BKNR_H_Z_8251
INRJ200420_BKNR_H_Z_8251
RAJ
JAIPR203
INRJ200420_BKNR_H_Z_8251
RAJ
你可以通过迭代来做到这一点,从开始,如果不是!像这样!
dataframe = [
['IDKA106829_KMGL_H_Z_8121','INKA100345_KMGL_H_Z_8251',''],
['IDRJ100041_BKNR_H_Z_8251','INRJ200420_BKNR_H_Z_8251','']
]
for row in dataframe:
for column in row:
if column.startswith('IDKA') or column.startswith('INKA'):
row[2] = 'KAR'
elif column.startswith('INJR') or column.startswith('IDRJ'):
row[2] = 'RAJ'
print (dataframe)
您可以使用 np.select
:
conditions = [(df["A"].str[:4].isin(["INKA", "IDKA"]))|(df["B"].str[:4].isin(["INKA", "IDKA"])),
(df["A"].str[:4].isin(["INAP", "IDAP"]))|(df["B"].str[:4].isin(["INAP", "IDAP"])),
(df["A"].str[:4].isin(["INRJ", "IDRJ"]))|(df["B"].str[:4].isin(["INRJ", "IDRJ"]))]
df["C"] = np.select(conditions, ["KAR", "AP", "RAJ"], None)
或者,您可以使用 map
和 combine_first
:
mapper = {"INKA": "KAR", "IDKA": "KAR", "INAP": "AP", "IDAP": "AP", "INRJ": "RAJ", "IDRJ": "RAJ"
df["C"] = df["A"].str[:4].map(mapper).combine_first(df["B"].str[:4].map(mapper))
配合使用startswith and np.select.
conditions = [df['A'].str.startswith('INKA') | df['B'].str.startswith('INKA') | df['A'].str.startswith('IDKA') | df['B'].str.startswith('IDKA'),
df['A'].str.startswith('INAP') | df['B'].str.startswith('INAP') | df['A'].str.startswith('IDAP') | df['B'].str.startswith('IDAP'),
df['A'].str.startswith('INRJ') | df['B'].str.startswith('INRJ') | df['A'].str.startswith('IDRJ') | df['B'].str.startswith('IDRJ')]
choices = ['KAR','AP', 'RAJ']
df['C'] = np.select(conditions, choices, default=None)
结果 df:
我的输入是(A,B有数据,需要在C列输出)
逻辑是。如果我在 A 和 B 列中的前四个字母是 INKA 或 IDKA,那么输出是 KAR。
同样的方式 INAP 或 IDAP 然后 AP 和 INRJ 或 IDRJ 然后 RAJ
输入为A和B,期望输出为C。
A | B | C |
---|---|---|
IDKA106829_KMGL_H_Z_8121 | INKA100345_KMGL_H_Z_8251 | KAR |
IDKA101971_KUDU_H_Z_8251 | YEDTHADY-IND | KAR |
SIRA_RPTR | IDKA102853_KSIR_H_Z_8251 | KAR |
IDAP104327_PEDA_H_Z_8251 | IDAP104769_URUM_H_Z_8251 | AP |
IDAP103547_RAMP_H_Z_8251 | MADDIRALA | AP |
SALURU | IDAP103620_SALU_H_Z_8251 | AP |
IDRJ103411_KOTA_H_Z_8251 | KOT009 | RAJ |
IDRJ100041_BKNR_H_Z_8251 | INRJ200420_BKNR_H_Z_8251 | RAJ |
JAIPR203 | INRJ200420_BKNR_H_Z_8251 | RAJ |
你可以通过迭代来做到这一点,从开始,如果不是!像这样!
dataframe = [
['IDKA106829_KMGL_H_Z_8121','INKA100345_KMGL_H_Z_8251',''],
['IDRJ100041_BKNR_H_Z_8251','INRJ200420_BKNR_H_Z_8251','']
]
for row in dataframe:
for column in row:
if column.startswith('IDKA') or column.startswith('INKA'):
row[2] = 'KAR'
elif column.startswith('INJR') or column.startswith('IDRJ'):
row[2] = 'RAJ'
print (dataframe)
您可以使用 np.select
:
conditions = [(df["A"].str[:4].isin(["INKA", "IDKA"]))|(df["B"].str[:4].isin(["INKA", "IDKA"])),
(df["A"].str[:4].isin(["INAP", "IDAP"]))|(df["B"].str[:4].isin(["INAP", "IDAP"])),
(df["A"].str[:4].isin(["INRJ", "IDRJ"]))|(df["B"].str[:4].isin(["INRJ", "IDRJ"]))]
df["C"] = np.select(conditions, ["KAR", "AP", "RAJ"], None)
或者,您可以使用 map
和 combine_first
:
mapper = {"INKA": "KAR", "IDKA": "KAR", "INAP": "AP", "IDAP": "AP", "INRJ": "RAJ", "IDRJ": "RAJ"
df["C"] = df["A"].str[:4].map(mapper).combine_first(df["B"].str[:4].map(mapper))
配合使用startswith and np.select.
conditions = [df['A'].str.startswith('INKA') | df['B'].str.startswith('INKA') | df['A'].str.startswith('IDKA') | df['B'].str.startswith('IDKA'),
df['A'].str.startswith('INAP') | df['B'].str.startswith('INAP') | df['A'].str.startswith('IDAP') | df['B'].str.startswith('IDAP'),
df['A'].str.startswith('INRJ') | df['B'].str.startswith('INRJ') | df['A'].str.startswith('IDRJ') | df['B'].str.startswith('IDRJ')]
choices = ['KAR','AP', 'RAJ']
df['C'] = np.select(conditions, choices, default=None)
结果 df: