同一行 2 个匹配值位于 2 个不同的列中。删除第二个匹配值

Same row 2 Matching Values in 2 different columns. Delete the second matching value

我想删除每行的第二个匹配项。 X1 列是我们将要匹配的列,它始终是引用,我们不会从 X1

中删除值

示例(起点)DataFrame df_client:

| Index |Name   |email             |city   |X1 |X2 |X3 |X4 |X5 |
---     |---    |---               |---    |---|---|---|---|---|
| 0     |Mary   |Mary@hotmail.com  |London |AB1|KD2|AB1|   |CM2|
| 1     |john   |john@hotmail.com  |Tokyo  |LK2|LK2|   |IG5|   |
| 2     |karl   |karl@hotmail.com  |London |MK6|   |MK6|   |
| 3     |jasmin |jasmin@hotmail.com|Toronto|UH5|FG6|UH5|   |   |
| 4     |Frank  |Frank@hotmail.com |Paris  |PO4|   |   |PO4|
| 5     |lee    |lee@hotmail.com   |Madrid |RT3|RT3|WS1|   |   |

我想将值 X2,X3,X4,X5 始终与 X1 以及每一行的值进行比较。

当我们找到匹配值时(例如row 0我想从X3中删除AB1)。也就是说,我们总是保留X1中的值,删除X2X3X4或[中的匹配值=22=].

我想补充一点,保证每一行的值在X2X3X4X5 匹配 X1:

中的值

期望的结果将如下所示:

|Index|Name  |email             |city   |X1  |X2|X3 |X4 |X5 |
 ---  |---   |---               |    ---|---|---|---|---|---
| 0   |Mary  |Mary@hotmail.com  |London |AB1|KD2|   |   |CM2|
| 1   |john  |john@hotmail.com  |Tokyo  |LK2|   |   |IG5|   |
| 2   |karl  |karl@hotmail.com  |London |MK6|   |   |   |   |
| 3   |jasmin|jasmin@hotmail.com|Toronto|UH5|FG6|   |   |   |
| 4   |Frank |Frank@hotmail.com |Paris  |PO4|   |   |   |   |
| 5   |lee   |lee@hotmail.com   |Madrid |RT3|WS1|   |   |

这并不重要,但理想情况下,如果有空单元格,我希望能够将值向左移动;像这样:

|Index|Name  |email             |city   |X1 |X2 |X3 |X4 |X5 |
 ---  |---   |               ---|---    |---|---|---|---|---
| 0   |Mary  |Mary@hotmail.com  |London |AB1|KD2|CM2|   |   |
| 1   |john  |john@hotmail.com  |Tokyo  |LK2|IG5|   |   |   |
| 2   |karl  |karl@hotmail.com  |London |MK6|   |   |   |   |
| 3   |jasmin|jasmin@hotmail.com|Toronto|UH5|FG6|   |   |   |
| 4   |Frank |Frank@hotmail.com |Paris  |PO4|   |   |   |   |
| 5   |lee   |lee@hotmail.com   |Madrid |RT3|WS1|   |   |   |

将值向左移动真的不重要,如果你能帮我删除匹配的值就足够了。

谢谢

您可以使用 apply(一个 lambda 函数)执行此操作,并在对行进行操作的 axis=1 上删除重复项。这会移动列,但您可以预先存储顺序并在完成后重新分配。

df = pd.DataFrame({'Name': {0: 'Mary', 1: 'john', 2: 'karl', 3: 'jasmin', 4: 'Frank', 5: 'lee'},
 'email': {0: 'Mary@hotmail.com',
  1: 'john@hotmail.com',
  2: 'karl@hotmail.com',
  3: 'jasmin@hotmail.com',
  4: 'Frank@hotmail.com',
  5: 'lee@hotmail.com'},
 'city': {0: 'London',
  1: 'Tokyo',
  2: 'London',
  3: 'Toronto',
  4: 'Paris',
  5: 'Madrid'},
 'X1': {0: 'AB1', 1: 'LK2', 2: 'MK6', 3: 'FG6', 4: 'PO4', 5: 'RT3'},
 'X2': {0: 'KD2', 1: 'LK2', 2: 'MK6', 3: 'UH5', 4: 'PO4', 5: 'RT3'},
 'X3': {0: 'AB1', 1: 'IG5', 2: None, 3: None, 4: None, 5: 'WS1'},
 'X4': {0: 'CM2', 1: None, 2: None, 3: None, 4: None, 5: None},
 'X5': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}})

col_order = df.columns
df = df.apply(lambda x: x.drop_duplicates(keep='first'), axis=1)
df = df[col_order]

输出 df:

    Name    email               city    X1  X2   X3     X4      X5
0   Mary    Mary@hotmail.com    London  AB1 KD2  NaN    CM2     NaN
1   john    john@hotmail.com    Tokyo   LK2 NaN  IG5    None    NaN
2   karl    karl@hotmail.com    London  MK6 NaN  None   NaN     NaN
3   jasmin  jasmin@hotmail.com  Toronto UH5 FG6  None   None    NaN
4   Frank   Frank@hotmail.com   Paris   PO4 NaN  None   NaN     NaN
5   lee     lee@hotmail.com     Madrid  RT3 NaN  WS1    None    NaN

如果你想把数据移到左边,你可以这样做。您需要更改最后一行的索引,以匹配在创建 shift_vals.

时删除某些列后剩下的列数
shift_vals = df.apply(lambda x: x.dropna().tolist(), axis=1)
new_df = pd.DataFrame(shift_vals.to_list())
new_df.columns = df.columns[0:-2]