比较 pandas 系列中连续行的字符串值

Question

我正在尝试使用用户定义的函数计算熊猫系列连续行中的常见字符串值，并将输出写入新列。我想出了单独的步骤，但是当我把它们放在一起时，我得到了错误的结果。你能告诉我最好的方法吗？我是一个非常初级的Pythonista！

我的 pandas df 是：

df = pd.DataFrame({"Code": ['d7e', '8e0d', 'ft1', '176', 'trk', 'tr71']})

我的字符串比较循环是：

x='d7e'
y='8e0d'
s=0
for i in y:
   b=str(i)
      if b not in x:
          s+=0
      else:
          s+=1
print(s)

这些特定字符串的正确结果是 2

请注意，当我执行 def func(x,y): s 计数器时发生了一些事情，它没有产生正确的结果。我想我需要在每次循环运行s.

时将它重置为 0

然后，我使用df.shift指定一系列中y和x的位置：

x = df["Code"]
y = df["Code"].shift(periods=-1, axis=0)

最后，我使用 df.apply() 方法来运行函数：

df["R1SB"] = df.apply(func, axis=0)

我在新列“R1SB”中得到 None 个值

我的正确输出是：

    "Code"   "R1SB"
0    d7e      None
1    8e0d     2
2    ft1      0
3    176      1
4    trk      0
5    tr71     2

感谢您的帮助！

Answer 1

尝试：

df['R1SB'] = df.assign(temp=df.Code.shift(1)).apply(
    lambda x: np.NAN
    if pd.isna(x['temp'])
    else sum(i in str(x['temp']) for i in str(x['Code'])),
    1,
)

输出：

   Code  R1SB
0   d7e   NaN
1  8e0d   2.0
2   ft1   0.0
3   176   1.0
4   trk   0.0
5  tr71   2.0

比较 pandas 系列中连续行的字符串值

Comparing string values from sequential rows in pandas series

loops

for-loop

shift

apply

pandas