从一个数据框列到另一个数据框列的字符串匹配
String matching from one data frame column to another data frame column
我有两个数据框,A 和 B。在 A 数据框中,两列 value 和 filed 。
在 B 数据框中也有 value 和 filed 列。我想将 B 的 'value' 列与 'Filed' 列的 A 相匹配,
将 A 的 Filed 替换为 B 的值。
A=
Value Filed
valid username username
valid username input_txtuserid
Password input_txtpassword
Password txtPassword
Login input_submit_log_in
LOG IN SIGNIN
B=
Value Filed
input_txtuserid "JOHN"
input_txtpassword "78945"
input_submit_log_in "Sucessfully"
Password txtPassword
City "London"
PLACE "4-A avenue Street"
我希望我的数据框 C 看起来像这样
C=
Value Filed
valid username "JOHN"
Password "78945"
Login "Sucessfully"
我写了下面的代码,但我得到 KeyError: 'City',
_map = dict(zip(A.Filed.values, A.Value.values))
def get_correct_value(行,_map=_map):
new_value = _map[row.Value]
filed = row.Filed
return new_value, filed
C = B.apply(get_correct_value, 轴=1, result_type='expand')
C.columns = ['Value','Filed']
我想忽略 A 数据框中不可用的关键字。 '
我假设 DataFrame 中是字符串,因为我们通常不使用 Dataframes 来携带变量。有了这个,我用你的数据框值创建了一个样本。
data_a = {"Value": ["valid username", "valid username", "Password", "Password", "Login", "LOG IN"],
"Filed": ["username", "input_txtuserid", "input_txtpassword", "txtPassword", "input_submit_log_in", "SIGNIN"]}
data_b = {"Value": ["input_txtuserid", "input_txtpassword", "input_submit_log_in", "Password", "City", "PLACE"],
"Filed": ["JOHN", "78945", "Sucessfully", "txtPassword", "London", "4-A avenue Street"]}
A = pd.DataFrame(data_a)
B = pd.DataFrame(data_b)
A 看起来像:
B 看起来像:
下面的代码创建C:
# Merging A and B, using a left join on the columns Filed for A and Value for B. Creatingg an indicator where exists
C = pd.merge(A, B, left_on=['Filed'], right_on=['Value'], how='left', indicator='Exist')
# If exists put true, otherwise false
C['Exist'] = np.where(C.Exist == 'both', True, False)
# Dropping all False so those that dont exist in both dataframes
C.drop(C[C['Exist'] == False].index, inplace=True)
# Making sure C has the right column and column names.
C = C[['Value_y', 'Filed_y']]
C.rename(columns = {"Value_y": "Value",
"Filed_y": "Filed"}, inplace = True)
C的输出
希望对您有所帮助! 如果是,请标记为答案 :)
我有两个数据框,A 和 B。在 A 数据框中,两列 value 和 filed 。 在 B 数据框中也有 value 和 filed 列。我想将 B 的 'value' 列与 'Filed' 列的 A 相匹配, 将 A 的 Filed 替换为 B 的值。 A=
Value Filed
valid username username
valid username input_txtuserid
Password input_txtpassword
Password txtPassword
Login input_submit_log_in
LOG IN SIGNIN
B=
Value Filed
input_txtuserid "JOHN"
input_txtpassword "78945"
input_submit_log_in "Sucessfully"
Password txtPassword
City "London"
PLACE "4-A avenue Street"
我希望我的数据框 C 看起来像这样
C=
Value Filed
valid username "JOHN"
Password "78945"
Login "Sucessfully"
我写了下面的代码,但我得到 KeyError: 'City',
_map = dict(zip(A.Filed.values, A.Value.values))
def get_correct_value(行,_map=_map):
new_value = _map[row.Value]
filed = row.Filed
return new_value, filed
C = B.apply(get_correct_value, 轴=1, result_type='expand')
C.columns = ['Value','Filed']
我想忽略 A 数据框中不可用的关键字。 '
我假设 DataFrame 中是字符串,因为我们通常不使用 Dataframes 来携带变量。有了这个,我用你的数据框值创建了一个样本。
data_a = {"Value": ["valid username", "valid username", "Password", "Password", "Login", "LOG IN"],
"Filed": ["username", "input_txtuserid", "input_txtpassword", "txtPassword", "input_submit_log_in", "SIGNIN"]}
data_b = {"Value": ["input_txtuserid", "input_txtpassword", "input_submit_log_in", "Password", "City", "PLACE"],
"Filed": ["JOHN", "78945", "Sucessfully", "txtPassword", "London", "4-A avenue Street"]}
A = pd.DataFrame(data_a)
B = pd.DataFrame(data_b)
A 看起来像:
B 看起来像:
下面的代码创建C:
# Merging A and B, using a left join on the columns Filed for A and Value for B. Creatingg an indicator where exists
C = pd.merge(A, B, left_on=['Filed'], right_on=['Value'], how='left', indicator='Exist')
# If exists put true, otherwise false
C['Exist'] = np.where(C.Exist == 'both', True, False)
# Dropping all False so those that dont exist in both dataframes
C.drop(C[C['Exist'] == False].index, inplace=True)
# Making sure C has the right column and column names.
C = C[['Value_y', 'Filed_y']]
C.rename(columns = {"Value_y": "Value",
"Filed_y": "Filed"}, inplace = True)
C的输出
希望对您有所帮助! 如果是,请标记为答案 :)