Pandas Dataframe 用另一列的值替换部分字符串

Question

我在尝试用另一列的值替换字符串时遇到替换问题。我想用 df['Length'].

替换 'Length'

df["Length"]= df["Length"].replace('Length', df['Length'], regex = True)

下面是我的数据

Input:
**Formula**  **Length**
Length           5
Length+1.5       6
Length-2.5       5
Length           4
5                5

Expected Output:
**Formula**  **Length**
5                5
6+1.5            6
5-2.5            5
4                4
5                5

但是，使用我上面使用的代码，它将替换我的整个单元格，而不是仅替换长度。我得到以下输出：我发现这是由于使用了 df['column']，如果我使用任何其他字符串，后面的偏移量 (-1.5) 将不会被替换。

**Formula**  **Length**
5                5
6                6
5                5
4                4
5                5

请问有没有其他列值的替换方法？

谢谢。

Answer 1

如果需要用另一列替换，请使用 DataFrame.apply:

df["Formula"]= df.apply(lambda x: x['Formula'].replace('Length', str(x['Length'])), axis=1)
print (df)
  Formula  Length
0       5       5
1   6+1.5       6
2   5-2.5       5
3       4       4
4       5       5

或列表理解：

df["Formula"]= [x.replace('Length', str(y)) for x, y  in df[['Formula','Length']].to_numpy()]

Answer 2

只是想补充一点，列表理解当然要快得多：

df = pd.DataFrame({'a': ['aba'] * 1000000, 'c': ['c'] * 1000000})

%timeit df.apply(lambda x: x['a'].replace('b', x['c']), axis=1)
# 1 loop, best of 5: 11.8 s per loop

%timeit [x.replace('b', str(y)) for x, y in df[['a', 'c']].to_numpy()]
# 1 loop, best of 5: 1.3 s per loop

Pandas Dataframe 用另一列的值替换部分字符串

Pandas Dataframe replace part of string with value from another column

python

str-replace

pandas