How to assign Pandas.Series.str.extractall() result back to original dataset? (TypeError: incompatible index of inserted column with frame index)

Question

数据集简要概述

dete_resignations['cease_date'].head()

给予

dete_resignations['cease_date'].value_counts()

给予

上面代码的

我试过的

我试图使用 'Pandas.Series.str.extractall()' 从“dete_resignations['cease_date']”中仅提取年份值（例如 05/2012 -> 2012）并将结果分配回原始数据框。但是，由于并非所有行都包含该特定字符串值（例如 05/2012），因此发生错误。

这是我写的代码。

pattern = r"(?P<month>[0-1][0-9])/?(?P<year>[0-2][0-9]{3})"
years = dete_resignations['cease_date'].str.extractall(pattern)
dete_resignations['cease_date_'] = years['year']

'TypeError: incompatible index of inserted column with frame index'

我认为 'years' 与“dete_resignations['cease']”共享相同的索引。因此，即使两个数据集的索引不相同，我还是希望 python 自动匹配并将值分配给正确的行。但它没有

谁能帮忙解决这个问题？

如有大神赐教，不胜感激！

Answer 1

如果你只想要年，那就不要在pattern中赶月，你可以用extract代替extractall:

# the $ indicates end of string
# \d is equivalent to [0-9]
# pattern extracts the last digit groups
pattern = '(?P<year>\d+)$'
years = dete_resignations['cease_date'].str.extract(pattern)
dete_resignations['cease_date_'] = years['year']

How to assign Pandas.Series.str.extractall() result back to original dataset? (TypeError: incompatible index of inserted column with frame index)

How to assign Pandas.Series.str.extractall() result back to original dataset? (TypeError: incompatible index of inserted column with frame index)

extract

assignment-operator

pandas