Pandas 中的日期到持续时间
Dates to Durations in Pandas
我觉得这应该很容易完成,但我不知道怎么做。我有一个 pandas
DataFrame
列 date:
0 2012-08-21
1 2013-02-17
2 2013-02-18
3 2013-03-03
4 2013-03-04
Name: date, dtype: datetime64[ns]
我想要一列持续时间,例如:
0 0
1 80 days
2 1 day
3 15 days
4 1 day
Name: date, dtype: datetime64[ns]
我的尝试产生了一堆 0 天,NaT
相反:
>>> df.date[1:] - df.date[:-1]
0 NaT
1 0 days
2 0 days
...
有什么想法吗?
Timedeltas
在这里很有用:(see docs)
Starting in v0.15.0, we introduce a new scalar type Timedelta, which is a subclass of datetime.timedelta, and behaves in a similar manner, but allows compatibility with np.timedelta64 types as well as a host of custom representation, parsing, and attributes.
Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative.
df
0
0 2012-08-21
1 2013-02-17
2 2013-02-18
3 2013-03-03
4 2013-03-04
你可以:
pd.to_timedelta(df)
TimedeltaIndex(['0 days'], dtype='timedelta64[ns]', freq=None)
0 0
1 180
2 1
3 13
4 1
Name: 0, dtype: int64
或者,您可以使用 .shift()
(或 .diff()
来计算时间点之间的差异,如@Andy Hayden 所示):
res = df-df.shift()
得到:
res.fillna(0)
0
0 0 days
1 180 days
2 1 days
3 13 days
4 1 days
您可以使用以下方法将这些从 timedelta64
dtype
转换为 integer
:
res.fillna(0).squeeze().dt.days
0 0
1 180
2 1
3 13
4 1
您可以使用 diff:
In [11]: s
Out[11]:
0 2012-08-21
1 2013-02-17
2 2013-02-18
3 2013-03-03
4 2013-03-04
Name: date, dtype: datetime64[ns]
In [12]: s.diff()
Out[12]:
0 NaT
1 180 days
2 1 days
3 13 days
4 1 days
Name: date, dtype: timedelta64[ns]
In [13]: s.diff().fillna(0)
Out[13]:
0 0 days
1 180 days
2 1 days
3 13 days
4 1 days
Name: date, dtype: timedelta64[ns]
df.date[1:] - df.date[:-1]
并不像您想象的那样。每个元素都减去 series/dataframe 索引映射,而不是系列中的位置。
计算 df.date[1:] - df.date[:-1]
是:
+---- index of df.date[1:]
| +---- index of df.date[:-1]
| |
| v
v
- 0 2012-08-21 = NaT
1 2013-02-17 - 1 2013-02-17 = 0
2 2013-02-18 - 2 2013-02-18 = 0
3 2013-03-03 - 3 2013-03-03 = 0
4 2013-03-04 - = NaT
我觉得这应该很容易完成,但我不知道怎么做。我有一个 pandas
DataFrame
列 date:
0 2012-08-21
1 2013-02-17
2 2013-02-18
3 2013-03-03
4 2013-03-04
Name: date, dtype: datetime64[ns]
我想要一列持续时间,例如:
0 0
1 80 days
2 1 day
3 15 days
4 1 day
Name: date, dtype: datetime64[ns]
我的尝试产生了一堆 0 天,NaT
相反:
>>> df.date[1:] - df.date[:-1]
0 NaT
1 0 days
2 0 days
...
有什么想法吗?
Timedeltas
在这里很有用:(see docs)
Starting in v0.15.0, we introduce a new scalar type Timedelta, which is a subclass of datetime.timedelta, and behaves in a similar manner, but allows compatibility with np.timedelta64 types as well as a host of custom representation, parsing, and attributes.
Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative.
df
0
0 2012-08-21
1 2013-02-17
2 2013-02-18
3 2013-03-03
4 2013-03-04
你可以:
pd.to_timedelta(df)
TimedeltaIndex(['0 days'], dtype='timedelta64[ns]', freq=None)
0 0
1 180
2 1
3 13
4 1
Name: 0, dtype: int64
或者,您可以使用 .shift()
(或 .diff()
来计算时间点之间的差异,如@Andy Hayden 所示):
res = df-df.shift()
得到:
res.fillna(0)
0
0 0 days
1 180 days
2 1 days
3 13 days
4 1 days
您可以使用以下方法将这些从 timedelta64
dtype
转换为 integer
:
res.fillna(0).squeeze().dt.days
0 0
1 180
2 1
3 13
4 1
您可以使用 diff:
In [11]: s
Out[11]:
0 2012-08-21
1 2013-02-17
2 2013-02-18
3 2013-03-03
4 2013-03-04
Name: date, dtype: datetime64[ns]
In [12]: s.diff()
Out[12]:
0 NaT
1 180 days
2 1 days
3 13 days
4 1 days
Name: date, dtype: timedelta64[ns]
In [13]: s.diff().fillna(0)
Out[13]:
0 0 days
1 180 days
2 1 days
3 13 days
4 1 days
Name: date, dtype: timedelta64[ns]
df.date[1:] - df.date[:-1]
并不像您想象的那样。每个元素都减去 series/dataframe 索引映射,而不是系列中的位置。
计算 df.date[1:] - df.date[:-1]
是:
+---- index of df.date[1:]
| +---- index of df.date[:-1]
| |
| v
v
- 0 2012-08-21 = NaT
1 2013-02-17 - 1 2013-02-17 = 0
2 2013-02-18 - 2 2013-02-18 = 0
3 2013-03-03 - 3 2013-03-03 = 0
4 2013-03-04 - = NaT