想知道如何通过考虑两列在 python 数据框中使用 startswith

want to know how to use startswith in python dataframe by considering two columns

我的输入是(A,B有数据,需要在C列输出)

逻辑是。如果我在 A 和 B 列中的前四个字母是 INKA 或 IDKA,那么输出是 KAR。

同样的方式 INAP 或 IDAP 然后 AP 和 INRJ 或 IDRJ 然后 RAJ

输入为A和B,期望输出为C。

A B C
IDKA106829_KMGL_H_Z_8121 INKA100345_KMGL_H_Z_8251 KAR
IDKA101971_KUDU_H_Z_8251 YEDTHADY-IND KAR
SIRA_RPTR IDKA102853_KSIR_H_Z_8251 KAR
IDAP104327_PEDA_H_Z_8251 IDAP104769_URUM_H_Z_8251 AP
IDAP103547_RAMP_H_Z_8251 MADDIRALA AP
SALURU IDAP103620_SALU_H_Z_8251 AP
IDRJ103411_KOTA_H_Z_8251 KOT009 RAJ
IDRJ100041_BKNR_H_Z_8251 INRJ200420_BKNR_H_Z_8251 RAJ
JAIPR203 INRJ200420_BKNR_H_Z_8251 RAJ

你可以通过迭代来做到这一点,从开始,如果不是!像这样!

dataframe = [
['IDKA106829_KMGL_H_Z_8121','INKA100345_KMGL_H_Z_8251',''],
['IDRJ100041_BKNR_H_Z_8251','INRJ200420_BKNR_H_Z_8251','']
]

for row in dataframe:
    for column in row:
        if column.startswith('IDKA') or column.startswith('INKA'):
            row[2] = 'KAR'
        elif column.startswith('INJR') or column.startswith('IDRJ'):
            row[2] = 'RAJ'

print (dataframe)

您可以使用 np.select:

conditions = [(df["A"].str[:4].isin(["INKA", "IDKA"]))|(df["B"].str[:4].isin(["INKA", "IDKA"])),
              (df["A"].str[:4].isin(["INAP", "IDAP"]))|(df["B"].str[:4].isin(["INAP", "IDAP"])),
              (df["A"].str[:4].isin(["INRJ", "IDRJ"]))|(df["B"].str[:4].isin(["INRJ", "IDRJ"]))]

df["C"] = np.select(conditions, ["KAR", "AP", "RAJ"], None)

或者,您可以使用 mapcombine_first:

mapper = {"INKA": "KAR", "IDKA": "KAR", "INAP": "AP", "IDAP": "AP", "INRJ": "RAJ", "IDRJ": "RAJ"
df["C"] = df["A"].str[:4].map(mapper).combine_first(df["B"].str[:4].map(mapper))

配合使用startswith and np.select.

conditions = [df['A'].str.startswith('INKA') | df['B'].str.startswith('INKA') | df['A'].str.startswith('IDKA') | df['B'].str.startswith('IDKA'),
              df['A'].str.startswith('INAP') | df['B'].str.startswith('INAP') | df['A'].str.startswith('IDAP') | df['B'].str.startswith('IDAP'),
              df['A'].str.startswith('INRJ') | df['B'].str.startswith('INRJ') | df['A'].str.startswith('IDRJ') | df['B'].str.startswith('IDRJ')]
 
choices = ['KAR','AP', 'RAJ']
 
df['C'] = np.select(conditions, choices, default=None)

结果 df: