同一行 2 个匹配值位于 2 个不同的列中。删除第二个匹配值
Same row 2 Matching Values in 2 different columns. Delete the second matching value
我想删除每行的第二个匹配项。 X1 列是我们将要匹配的列,它始终是引用,我们不会从 X1
中删除值
示例(起点)DataFrame df_client:
| Index |Name |email |city |X1 |X2 |X3 |X4 |X5 |
--- |--- |--- |--- |---|---|---|---|---|
| 0 |Mary |Mary@hotmail.com |London |AB1|KD2|AB1| |CM2|
| 1 |john |john@hotmail.com |Tokyo |LK2|LK2| |IG5| |
| 2 |karl |karl@hotmail.com |London |MK6| |MK6| |
| 3 |jasmin |jasmin@hotmail.com|Toronto|UH5|FG6|UH5| | |
| 4 |Frank |Frank@hotmail.com |Paris |PO4| | |PO4|
| 5 |lee |lee@hotmail.com |Madrid |RT3|RT3|WS1| | |
我想将值 X2,X3,X4,X5
始终与 X1
以及每一行的值进行比较。
当我们找到匹配值时(例如row 0
我想从X3
中删除AB1
)。也就是说,我们总是保留在X1
中的值,删除X2
或X3
或X4
或[中的匹配值=22=].
我想补充一点,保证每一行的值在X2
或X3
或X4
或X5
匹配 X1
:
中的值
期望的结果将如下所示:
|Index|Name |email |city |X1 |X2|X3 |X4 |X5 |
--- |--- |--- | ---|---|---|---|---|---
| 0 |Mary |Mary@hotmail.com |London |AB1|KD2| | |CM2|
| 1 |john |john@hotmail.com |Tokyo |LK2| | |IG5| |
| 2 |karl |karl@hotmail.com |London |MK6| | | | |
| 3 |jasmin|jasmin@hotmail.com|Toronto|UH5|FG6| | | |
| 4 |Frank |Frank@hotmail.com |Paris |PO4| | | | |
| 5 |lee |lee@hotmail.com |Madrid |RT3|WS1| | |
这并不重要,但理想情况下,如果有空单元格,我希望能够将值向左移动;像这样:
|Index|Name |email |city |X1 |X2 |X3 |X4 |X5 |
--- |--- | ---|--- |---|---|---|---|---
| 0 |Mary |Mary@hotmail.com |London |AB1|KD2|CM2| | |
| 1 |john |john@hotmail.com |Tokyo |LK2|IG5| | | |
| 2 |karl |karl@hotmail.com |London |MK6| | | | |
| 3 |jasmin|jasmin@hotmail.com|Toronto|UH5|FG6| | | |
| 4 |Frank |Frank@hotmail.com |Paris |PO4| | | | |
| 5 |lee |lee@hotmail.com |Madrid |RT3|WS1| | | |
将值向左移动真的不重要,如果你能帮我删除匹配的值就足够了。
谢谢
您可以使用 apply(一个 lambda 函数)执行此操作,并在对行进行操作的 axis=1 上删除重复项。这会移动列,但您可以预先存储顺序并在完成后重新分配。
df = pd.DataFrame({'Name': {0: 'Mary', 1: 'john', 2: 'karl', 3: 'jasmin', 4: 'Frank', 5: 'lee'},
'email': {0: 'Mary@hotmail.com',
1: 'john@hotmail.com',
2: 'karl@hotmail.com',
3: 'jasmin@hotmail.com',
4: 'Frank@hotmail.com',
5: 'lee@hotmail.com'},
'city': {0: 'London',
1: 'Tokyo',
2: 'London',
3: 'Toronto',
4: 'Paris',
5: 'Madrid'},
'X1': {0: 'AB1', 1: 'LK2', 2: 'MK6', 3: 'FG6', 4: 'PO4', 5: 'RT3'},
'X2': {0: 'KD2', 1: 'LK2', 2: 'MK6', 3: 'UH5', 4: 'PO4', 5: 'RT3'},
'X3': {0: 'AB1', 1: 'IG5', 2: None, 3: None, 4: None, 5: 'WS1'},
'X4': {0: 'CM2', 1: None, 2: None, 3: None, 4: None, 5: None},
'X5': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}})
col_order = df.columns
df = df.apply(lambda x: x.drop_duplicates(keep='first'), axis=1)
df = df[col_order]
输出 df:
Name email city X1 X2 X3 X4 X5
0 Mary Mary@hotmail.com London AB1 KD2 NaN CM2 NaN
1 john john@hotmail.com Tokyo LK2 NaN IG5 None NaN
2 karl karl@hotmail.com London MK6 NaN None NaN NaN
3 jasmin jasmin@hotmail.com Toronto UH5 FG6 None None NaN
4 Frank Frank@hotmail.com Paris PO4 NaN None NaN NaN
5 lee lee@hotmail.com Madrid RT3 NaN WS1 None NaN
如果你想把数据移到左边,你可以这样做。您需要更改最后一行的索引,以匹配在创建 shift_vals.
时删除某些列后剩下的列数
shift_vals = df.apply(lambda x: x.dropna().tolist(), axis=1)
new_df = pd.DataFrame(shift_vals.to_list())
new_df.columns = df.columns[0:-2]
我想删除每行的第二个匹配项。 X1 列是我们将要匹配的列,它始终是引用,我们不会从 X1
中删除值示例(起点)DataFrame df_client:
| Index |Name |email |city |X1 |X2 |X3 |X4 |X5 |
--- |--- |--- |--- |---|---|---|---|---|
| 0 |Mary |Mary@hotmail.com |London |AB1|KD2|AB1| |CM2|
| 1 |john |john@hotmail.com |Tokyo |LK2|LK2| |IG5| |
| 2 |karl |karl@hotmail.com |London |MK6| |MK6| |
| 3 |jasmin |jasmin@hotmail.com|Toronto|UH5|FG6|UH5| | |
| 4 |Frank |Frank@hotmail.com |Paris |PO4| | |PO4|
| 5 |lee |lee@hotmail.com |Madrid |RT3|RT3|WS1| | |
我想将值 X2,X3,X4,X5
始终与 X1
以及每一行的值进行比较。
当我们找到匹配值时(例如row 0
我想从X3
中删除AB1
)。也就是说,我们总是保留在X1
中的值,删除X2
或X3
或X4
或[中的匹配值=22=].
我想补充一点,保证每一行的值在X2
或X3
或X4
或X5
匹配 X1
:
期望的结果将如下所示:
|Index|Name |email |city |X1 |X2|X3 |X4 |X5 |
--- |--- |--- | ---|---|---|---|---|---
| 0 |Mary |Mary@hotmail.com |London |AB1|KD2| | |CM2|
| 1 |john |john@hotmail.com |Tokyo |LK2| | |IG5| |
| 2 |karl |karl@hotmail.com |London |MK6| | | | |
| 3 |jasmin|jasmin@hotmail.com|Toronto|UH5|FG6| | | |
| 4 |Frank |Frank@hotmail.com |Paris |PO4| | | | |
| 5 |lee |lee@hotmail.com |Madrid |RT3|WS1| | |
这并不重要,但理想情况下,如果有空单元格,我希望能够将值向左移动;像这样:
|Index|Name |email |city |X1 |X2 |X3 |X4 |X5 |
--- |--- | ---|--- |---|---|---|---|---
| 0 |Mary |Mary@hotmail.com |London |AB1|KD2|CM2| | |
| 1 |john |john@hotmail.com |Tokyo |LK2|IG5| | | |
| 2 |karl |karl@hotmail.com |London |MK6| | | | |
| 3 |jasmin|jasmin@hotmail.com|Toronto|UH5|FG6| | | |
| 4 |Frank |Frank@hotmail.com |Paris |PO4| | | | |
| 5 |lee |lee@hotmail.com |Madrid |RT3|WS1| | | |
将值向左移动真的不重要,如果你能帮我删除匹配的值就足够了。
谢谢
您可以使用 apply(一个 lambda 函数)执行此操作,并在对行进行操作的 axis=1 上删除重复项。这会移动列,但您可以预先存储顺序并在完成后重新分配。
df = pd.DataFrame({'Name': {0: 'Mary', 1: 'john', 2: 'karl', 3: 'jasmin', 4: 'Frank', 5: 'lee'},
'email': {0: 'Mary@hotmail.com',
1: 'john@hotmail.com',
2: 'karl@hotmail.com',
3: 'jasmin@hotmail.com',
4: 'Frank@hotmail.com',
5: 'lee@hotmail.com'},
'city': {0: 'London',
1: 'Tokyo',
2: 'London',
3: 'Toronto',
4: 'Paris',
5: 'Madrid'},
'X1': {0: 'AB1', 1: 'LK2', 2: 'MK6', 3: 'FG6', 4: 'PO4', 5: 'RT3'},
'X2': {0: 'KD2', 1: 'LK2', 2: 'MK6', 3: 'UH5', 4: 'PO4', 5: 'RT3'},
'X3': {0: 'AB1', 1: 'IG5', 2: None, 3: None, 4: None, 5: 'WS1'},
'X4': {0: 'CM2', 1: None, 2: None, 3: None, 4: None, 5: None},
'X5': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}})
col_order = df.columns
df = df.apply(lambda x: x.drop_duplicates(keep='first'), axis=1)
df = df[col_order]
输出 df:
Name email city X1 X2 X3 X4 X5
0 Mary Mary@hotmail.com London AB1 KD2 NaN CM2 NaN
1 john john@hotmail.com Tokyo LK2 NaN IG5 None NaN
2 karl karl@hotmail.com London MK6 NaN None NaN NaN
3 jasmin jasmin@hotmail.com Toronto UH5 FG6 None None NaN
4 Frank Frank@hotmail.com Paris PO4 NaN None NaN NaN
5 lee lee@hotmail.com Madrid RT3 NaN WS1 None NaN
如果你想把数据移到左边,你可以这样做。您需要更改最后一行的索引,以匹配在创建 shift_vals.
时删除某些列后剩下的列数shift_vals = df.apply(lambda x: x.dropna().tolist(), axis=1)
new_df = pd.DataFrame(shift_vals.to_list())
new_df.columns = df.columns[0:-2]