使用方法链接将可变长度的切片分配给列
Assign slice of variable lenght to column with method chaining
我想为一列分配另一列的可变长度的一部分,但不知何故它不能像我预期的那样工作,我不明白为什么:
import numpy as np
import pandas as pd
m = np.array([[1, 'AAAAA'],
[2, 'BBBB'],
[3, 'CCC']])
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x['s1'].str.slice(start=0, stop=x['s1'].str.len()-1))
)
print(df)
这导致
id s1 s2
0 1 AAAAA NaN
1 2 BBBB NaN
2 3 CCC NaN
但是,我期望以下内容:
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC
知道这里发生了什么吗?
问题出在您的 slice()
stop
arg 中,只需 -1
.
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x['s1'].str.slice(start=0, stop=-1)
)
您需要 str[:-1]
来索引没有最后一个列的所有值:
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x['s1'].str[:-1])
)
print(df)
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC
您的解决方案仅适用于使用 apply
分别检查每一行,例如:
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x.apply(lambda y: y['s1'][0:len(y['s1'])-1], axis=1))
)
print(df)
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC
您可以像这样申请 pandas:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({"id":[1,2,3],"s1":["AAAAA","BBBB","CCC"]})
In [3]: df
Out[3]:
id s1
0 1 AAAAA
1 2 BBBB
2 3 CCC
In [4]: df["s2"] = df["s1"].apply(lambda x: x[:-1])
In [5]: df
Out[5]:
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC
In [6]:
我想为一列分配另一列的可变长度的一部分,但不知何故它不能像我预期的那样工作,我不明白为什么:
import numpy as np
import pandas as pd
m = np.array([[1, 'AAAAA'],
[2, 'BBBB'],
[3, 'CCC']])
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x['s1'].str.slice(start=0, stop=x['s1'].str.len()-1))
)
print(df)
这导致
id s1 s2
0 1 AAAAA NaN
1 2 BBBB NaN
2 3 CCC NaN
但是,我期望以下内容:
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC
知道这里发生了什么吗?
问题出在您的 slice()
stop
arg 中,只需 -1
.
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x['s1'].str.slice(start=0, stop=-1)
)
您需要 str[:-1]
来索引没有最后一个列的所有值:
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x['s1'].str[:-1])
)
print(df)
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC
您的解决方案仅适用于使用 apply
分别检查每一行,例如:
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x.apply(lambda y: y['s1'][0:len(y['s1'])-1], axis=1))
)
print(df)
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC
您可以像这样申请 pandas:
In [1]: import pandas as pd In [2]: df = pd.DataFrame({"id":[1,2,3],"s1":["AAAAA","BBBB","CCC"]}) In [3]: df Out[3]: id s1 0 1 AAAAA 1 2 BBBB 2 3 CCC In [4]: df["s2"] = df["s1"].apply(lambda x: x[:-1]) In [5]: df Out[5]: id s1 s2 0 1 AAAAA AAAA 1 2 BBBB BBB 2 3 CCC CC In [6]: