年份范围到日期时间格式

Question

目前我有一系列字符串作为 pandas 数据框中的一列，它以 "yyyy-yyyy" 格式表示特定年份，例如“2004-2005”是一个 此列中的字符串 值。

我想知道是否可以将其从 string 转换为类似于 datetime 格式的内容。

这样做的目的是计算此列的值与"Years"中其他类似列的值之间的差异。例如类似于下面的内容：

col 1        col2        Answer(Total years)
2004-2005    2006-2007    3

注意：我想到的方法之一是每年做一个字典映射到一个唯一的整数值，然后计算它们之间的差值。

虽然我想知道是否有更简单的方法。

Answer 1

"something similar to a datetime object." 日期时间不是为了表示日期范围而设计的。

如果你想创建一对日期时间对象，你可以这样做：

[datetime.datetime.strptime(x, '%Y') for x in '2005-2006'.split('-')]

或者您可以尝试使用 Pandas date_range 对象，如果它更接近您想要的。

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.date_range.html

Answer 2

如果您想找出最低年份和最高年份之间的差异，请尝试一下

col1="2004-2005"
col2="2006-2007"
col1=col1.split("-") # make a list of the years in col1 ['2004', '2005']
col2=col2.split("-") # make a list of the years in col2 ['2006', '2007']
biglist=col1+col2 #add the two list
biglist.sort() #sort the list from lowest year to highest year
Answer=int(biglist[len(biglist)-1])-int(biglist[0]) #find the difference between lowest and highest year

Answer 3

您似乎用第 1 列中的第一年减去第 2 列中的最后一年。在这种情况下，我会使用 str.extract (and convert the result to a number):

In [11]: pd.to_numeric(df['col 1'].str.extract('(\d{4})'))
Out[11]:
0    2004
Name: col 1, dtype: int64

In [12]: pd.to_numeric(df['col2'].str.extract('-(\d{4})')) - pd.to_numeric(df['col 1'].str.extract('(\d{4})'))
Out[12]:
0    3
dtype: int64

年份范围到日期时间格式

Year range to date time format

python

datetime

data-analysis

pandas