使用方法链接将可变长度的切片分配给列

Assign slice of variable lenght to column with method chaining

我想为一列分配另一列的可变长度的一部分,但不知何故它不能像我预期的那样工作,我不明白为什么:

import numpy as np
import pandas as pd

m = np.array([[1, 'AAAAA'],
               [2, 'BBBB'],
               [3, 'CCC']])

df = (pd.DataFrame(m, columns = ['id', 's1'])
        .assign(
                s2 = lambda x: x['s1'].str.slice(start=0, stop=x['s1'].str.len()-1))
        )

print(df)

这导致

  id     s1  s2
0  1  AAAAA NaN
1  2   BBBB NaN
2  3    CCC NaN

但是,我期望以下内容:

  id     s1   s2
0  1  AAAAA AAAA
1  2   BBBB  BBB
2  3    CCC   CC

知道这里发生了什么吗?

问题出在您的 slice() stop arg 中,只需 -1.

df = (pd.DataFrame(m, columns = ['id', 's1'])
        .assign(
                s2 = lambda x: x['s1'].str.slice(start=0, stop=-1)
        )

您需要 str[:-1] 来索引没有最后一个列的所有值:

df = (pd.DataFrame(m, columns = ['id', 's1'])
        .assign(
                s2 = lambda x: x['s1'].str[:-1])
        )

print(df)
  id     s1    s2
0  1  AAAAA  AAAA
1  2   BBBB   BBB
2  3    CCC    CC

您的解决方案仅适用于使用 apply 分别检查每一行,例如:

df = (pd.DataFrame(m, columns = ['id', 's1'])
        .assign(
                s2 = lambda x: x.apply(lambda y: y['s1'][0:len(y['s1'])-1], axis=1))
        )

print(df)
  id     s1    s2
0  1  AAAAA  AAAA
1  2   BBBB   BBB
2  3    CCC    CC

您可以像这样申请 pandas:

In [1]: import pandas as pd                                                     

In [2]: df = pd.DataFrame({"id":[1,2,3],"s1":["AAAAA","BBBB","CCC"]})           

In [3]: df                                                                      
Out[3]: 
   id     s1
0   1  AAAAA
1   2   BBBB
2   3    CCC

In [4]: df["s2"] = df["s1"].apply(lambda x: x[:-1])                             

In [5]: df                                                                      
Out[5]: 
   id     s1    s2
0   1  AAAAA  AAAA
1   2   BBBB   BBB
2   3    CCC    CC

In [6]: