从一个数据框列到另一个数据框列的字符串匹配

String matching from one data frame column to another data frame column

我有两个数据框,A 和 B。在 A 数据框中,两列 value 和 filed 。 在 B 数据框中也有 value 和 filed 列。我想将 B 的 'value' 列与 'Filed' 列的 A 相匹配, 将 A 的 Filed 替换为 B 的值。 A=

 Value                     Filed        
valid username              username
valid username           input_txtuserid
Password                 input_txtpassword
Password                 txtPassword
Login                     input_submit_log_in
LOG IN                    SIGNIN

B=

 Value                     Filed        
input_txtuserid               "JOHN"
input_txtpassword           "78945"
input_submit_log_in        "Sucessfully"
Password                 txtPassword
City                       "London"
PLACE                      "4-A avenue Street"

我希望我的数据框 C 看起来像这样

C=

Value                     Filed        
valid username            "JOHN"
Password                   "78945"
Login                       "Sucessfully"

我写了下面的代码,但我得到 KeyError: 'City',

_map = dict(zip(A.Filed.values, A.Value.values))

def get_correct_value(行,_map=_map):

 new_value = _map[row.Value]

 filed = row.Filed

return new_value, filed

C = B.apply(get_correct_value, 轴=1, result_type='expand')

C.columns = ['Value','Filed']

我想忽略 A 数据框中不可用的关键字。 '

我假设 DataFrame 中是字符串,因为我们通常不使用 Dataframes 来携带变量。有了这个,我用你的数据框值创建了一个样本。

data_a = {"Value": ["valid username", "valid username", "Password", "Password", "Login", "LOG IN"],
         "Filed": ["username", "input_txtuserid", "input_txtpassword", "txtPassword", "input_submit_log_in", "SIGNIN"]}

data_b = {"Value": ["input_txtuserid", "input_txtpassword", "input_submit_log_in", "Password", "City", "PLACE"],
          "Filed": ["JOHN", "78945", "Sucessfully", "txtPassword", "London", "4-A avenue Street"]}

A = pd.DataFrame(data_a)
B = pd.DataFrame(data_b)

A 看起来像:

B 看起来像:

下面的代码创建C:

# Merging A and B, using a left join on the columns Filed for A and Value for B. Creatingg an indicator where exists
C = pd.merge(A, B, left_on=['Filed'], right_on=['Value'], how='left', indicator='Exist')

# If exists put true, otherwise false
C['Exist'] = np.where(C.Exist == 'both', True, False)
# Dropping all False so those that dont exist in both dataframes
C.drop(C[C['Exist'] == False].index, inplace=True)

# Making sure C has the right column and column names.
C = C[['Value_y', 'Filed_y']]
C.rename(columns = {"Value_y": "Value",
                    "Filed_y": "Filed"}, inplace = True)

C的输出

希望对您有所帮助! 如果是,请标记为答案 :)