用正斜杠分割一个系列中的整数

Question

我正在尝试使用 python 的 rsplit 函数通过正斜杠拆分系列中的整数，但它不起作用。

原始数据

date
1/30/2015
1/30/2015
1/30/2015
1/30/2015
1/30/2015
1/30/2015
1/30/2015
1/30/2015
1/30/2015
1/30/2015

预期数据

我想用'/'分割

    date

'1' '30' '2015'
'1' '30' '2015'
'1' '30' '2015'
'1' '30' '2015'
'1' '30' '2015'
'1' '30' '2015'
'1' '30' '2015'

这样做的目的是将年份放在单独的列中。我之前尝试使用下面的代码。

date =  df['date']
split = date.rsplit("/")
OutputData['Year']=split[2]

split[2]-> 是日期系列中的年份 - 目的是将年份放在单独的列中

非常感谢

这是我每次得到的错误'this is a series of objects'

AttributeError: 'Series' 对象没有属性 'split'

Answer 1

您可以使用 str 访问器在系列上使用字符串方法：

df["date"].str.rsplit("/")

或者将它们放在不同的列中：

df["date"].str.rsplit("/", expand = True)

对于系列，处理日期时间数据可能会更好：

import pandas as pd
pd.to_datetime(df["date"]).dt.year
Out[10]: 
0    2015
1    2015
2    2015
3    2015
4    2015
5    2015
6    2015
7    2015
8    2015
9    2015
Name: date, dtype: int64

Answer 2

IMO 使用 to_datetime so you can perform arithmetic operations on it and if you want the year or any other date/time component you can use the vectorised dt 访问器将字符串转换为 datetime 会更有用：

In [23]:
df['date'] = pd.to_datetime(df['date'])
df

Out[23]:
        date
0 2015-01-30
1 2015-01-30
2 2015-01-30
3 2015-01-30
4 2015-01-30
5 2015-01-30
6 2015-01-30
7 2015-01-30
8 2015-01-30
9 2015-01-30

In [24]:
df['year'] = df['date'].dt.year
df

Out[24]:
        date  year
0 2015-01-30  2015
1 2015-01-30  2015
2 2015-01-30  2015
3 2015-01-30  2015
4 2015-01-30  2015
5 2015-01-30  2015
6 2015-01-30  2015
7 2015-01-30  2015
8 2015-01-30  2015
9 2015-01-30  2015

用正斜杠分割一个系列中的整数

split integers in a series by forward slash

python

series

dataframe

pandas

原始数据

预期数据

这是我每次得到的错误'this is a series of objects'