将不规则 H:M:S 时间戳数据组合成 R 中的每小时间隔
Combining irregular H:M:S time stamp data into hourly intervals in R
抱歉,如果已经有类似查询的答案,但我似乎找不到它!我是 R 的新手,但决定不为此恢复 VBA...
我的问题是关于为使用 ses 进行预测准备数据。我有一组票据数据(约 25,000 个条目),其中包含我从 Excel:
导入的时间戳
Number Created Category Priority `Incident state` `Reassignment count` Urgency Impact
<dbl> <dttm> <chr> <chr> <chr> <dbl> <chr> <chr>
1 1 2014-07-01 19:16:00 Software/System 5 - Minor Closed 0 3 - Low 3 - Low
2 2 2014-07-02 15:27:00 Software/System 5 - Minor Closed 0 3 - Low 3 - Low
3 3 2014-07-02 15:27:00 Software/System 5 - Minor Closed 0 3 - Low 3 - Low
4 4 2014-07-02 15:27:00 Software/System 5 - Minor Closed 0 3 - Low 3 - Low
5 5 2014-07-02 15:28:00 Software/System 5 - Minor Closed 0 3 - Low 3 - Low
6 6 2014-07-02 15:29:00 Software/System 5 - Minor Closed 0 3 - Low 3 - Low
数据没有定期间隔,因为没有在工作时间以外提出票证,所以我无法指定 seq()。在转换为我可以预测的时间序列之前,我需要将 Created 列子集化为每小时块。我尝试将“已创建”列四舍五入到小时数:
modelling_messy$Created <- as.POSIXct(modelling_messy$Created,format="%Y/%m/%d %H:%M:%S", tz = "GMT")
modelling_messy$Created <- as.POSIXct(round(modelling_messy$Created, units = "hours"))
这让我的数据看起来像我想要的那样,并允许我聚合()具有相同小时时间戳的所有条目,但是当我使用 ts()
# A tibble: 2 x 8
Number Created Category Priority `Incident state` `Reassignment count` Urgency Impact
<dbl> <dttm> <chr> <dbl> <chr> <dbl> <chr> <chr>
1 1 2014-07-01 19:00:00 Software/System 5 Closed 0 3 - Low 3 - Low
2 2 2014-07-02 15:00:00 Software/System 5 Closed 0 3 - Low 3 - Low
> myts <- ts(modelling_clean[,1:2], start = c(2014-07-01, 1), freq = 1)
> head(myts)
Time Series:
Start = 2006
End = 2011
Frequency = 1
Group.1 Number
2006 1404241200 1
2007 1404313200 5
2008 1404316800 1
2009 1404907200 8
2010 1404910800 28
2011 1404914400 1
我知道我搞砸了 ts() 但我找不到修复方法!我希望时间数据保持为“%Y-%m-%d %H:00:00”或其他有用的 date/hour 组合(顺便说一下,我只涵盖 2014 - 2017 年)。
非常感谢任何帮助。
Ta 很多。
编辑
感谢您的建议 - 我认为这将解决转换为时间序列的问题,但我不确定如何从我当前的 Tibble 中获取 df$Created 的数据(手动编码的数据太多!)我尝试了以下但抛出错误:
> df = data.frame(Created = modelling_messy$Created),stringsAsFactors = F)
Error: unexpected ',' in "df = data.frame(Created = modelling_messy$Created),"
> df$id = seq_along(nrow(df))
Error in df$id = seq_along(nrow(df)) :
'closure' 类型的对象不可子集化
提前致谢!
您可以使用 xts 包创建每小时时间序列,如下所示:
library(xts)
# sample data
df = data.frame(Created = c("2014-07-01 19:16:00","2014-07-02 15:27:00","2014-07-02 15:27:00","2014-07-02 15:27:00",
"2014-07-02 15:28:00","2014-07-02 15:29:00"),stringsAsFactors = F)
df$id = seq_along(nrow(df))
# Round dates to hours
df$Created <- as.POSIXct(df$Created,format="%Y-%m-%d %H", tz = "GMT")
# Let's aggregate and create hourly data
df = aggregate(id ~ Created, df,length)
time_series = data.frame(Created= seq( min(df$Created), max(df$Created),by='1 hour'))
time_series = merge(time_series,df,by="Created",all.x=TRUE)
time_series$id[is.na(time_series$id)]=0
# create timeseries object
library(xts)
myxts = xts(time_series$id, order.by = time_series$Created)
输出:
[,1]
2014-07-01 19:00:00 1
2014-07-01 20:00:00 0
2014-07-01 21:00:00 0
2014-07-01 22:00:00 0
2014-07-01 23:00:00 0
2014-07-02 00:00:00 0
2014-07-02 01:00:00 0
2014-07-02 02:00:00 0
2014-07-02 03:00:00 0
2014-07-02 04:00:00 0
2014-07-02 05:00:00 0
2014-07-02 06:00:00 0
2014-07-02 07:00:00 0
2014-07-02 08:00:00 0
2014-07-02 09:00:00 0
2014-07-02 10:00:00 0
2014-07-02 11:00:00 0
2014-07-02 12:00:00 0
2014-07-02 13:00:00 0
2014-07-02 14:00:00 0
2014-07-02 15:00:00 5
有效!
免责声明:这是我第一次在 R 中玩时间序列,所以可能有其他(即更好的)方法来实现这一点。
抱歉,如果已经有类似查询的答案,但我似乎找不到它!我是 R 的新手,但决定不为此恢复 VBA...
我的问题是关于为使用 ses 进行预测准备数据。我有一组票据数据(约 25,000 个条目),其中包含我从 Excel:
导入的时间戳 Number Created Category Priority `Incident state` `Reassignment count` Urgency Impact
<dbl> <dttm> <chr> <chr> <chr> <dbl> <chr> <chr>
1 1 2014-07-01 19:16:00 Software/System 5 - Minor Closed 0 3 - Low 3 - Low
2 2 2014-07-02 15:27:00 Software/System 5 - Minor Closed 0 3 - Low 3 - Low
3 3 2014-07-02 15:27:00 Software/System 5 - Minor Closed 0 3 - Low 3 - Low
4 4 2014-07-02 15:27:00 Software/System 5 - Minor Closed 0 3 - Low 3 - Low
5 5 2014-07-02 15:28:00 Software/System 5 - Minor Closed 0 3 - Low 3 - Low
6 6 2014-07-02 15:29:00 Software/System 5 - Minor Closed 0 3 - Low 3 - Low
数据没有定期间隔,因为没有在工作时间以外提出票证,所以我无法指定 seq()。在转换为我可以预测的时间序列之前,我需要将 Created 列子集化为每小时块。我尝试将“已创建”列四舍五入到小时数:
modelling_messy$Created <- as.POSIXct(modelling_messy$Created,format="%Y/%m/%d %H:%M:%S", tz = "GMT")
modelling_messy$Created <- as.POSIXct(round(modelling_messy$Created, units = "hours"))
这让我的数据看起来像我想要的那样,并允许我聚合()具有相同小时时间戳的所有条目,但是当我使用 ts()
# A tibble: 2 x 8
Number Created Category Priority `Incident state` `Reassignment count` Urgency Impact
<dbl> <dttm> <chr> <dbl> <chr> <dbl> <chr> <chr>
1 1 2014-07-01 19:00:00 Software/System 5 Closed 0 3 - Low 3 - Low
2 2 2014-07-02 15:00:00 Software/System 5 Closed 0 3 - Low 3 - Low
> myts <- ts(modelling_clean[,1:2], start = c(2014-07-01, 1), freq = 1)
> head(myts)
Time Series:
Start = 2006
End = 2011
Frequency = 1
Group.1 Number
2006 1404241200 1
2007 1404313200 5
2008 1404316800 1
2009 1404907200 8
2010 1404910800 28
2011 1404914400 1
我知道我搞砸了 ts() 但我找不到修复方法!我希望时间数据保持为“%Y-%m-%d %H:00:00”或其他有用的 date/hour 组合(顺便说一下,我只涵盖 2014 - 2017 年)。
非常感谢任何帮助。
Ta 很多。
编辑 感谢您的建议 - 我认为这将解决转换为时间序列的问题,但我不确定如何从我当前的 Tibble 中获取 df$Created 的数据(手动编码的数据太多!)我尝试了以下但抛出错误:
> df = data.frame(Created = modelling_messy$Created),stringsAsFactors = F)
Error: unexpected ',' in "df = data.frame(Created = modelling_messy$Created),"
> df$id = seq_along(nrow(df))
Error in df$id = seq_along(nrow(df)) :
'closure' 类型的对象不可子集化
提前致谢!
您可以使用 xts 包创建每小时时间序列,如下所示:
library(xts)
# sample data
df = data.frame(Created = c("2014-07-01 19:16:00","2014-07-02 15:27:00","2014-07-02 15:27:00","2014-07-02 15:27:00",
"2014-07-02 15:28:00","2014-07-02 15:29:00"),stringsAsFactors = F)
df$id = seq_along(nrow(df))
# Round dates to hours
df$Created <- as.POSIXct(df$Created,format="%Y-%m-%d %H", tz = "GMT")
# Let's aggregate and create hourly data
df = aggregate(id ~ Created, df,length)
time_series = data.frame(Created= seq( min(df$Created), max(df$Created),by='1 hour'))
time_series = merge(time_series,df,by="Created",all.x=TRUE)
time_series$id[is.na(time_series$id)]=0
# create timeseries object
library(xts)
myxts = xts(time_series$id, order.by = time_series$Created)
输出:
[,1]
2014-07-01 19:00:00 1
2014-07-01 20:00:00 0
2014-07-01 21:00:00 0
2014-07-01 22:00:00 0
2014-07-01 23:00:00 0
2014-07-02 00:00:00 0
2014-07-02 01:00:00 0
2014-07-02 02:00:00 0
2014-07-02 03:00:00 0
2014-07-02 04:00:00 0
2014-07-02 05:00:00 0
2014-07-02 06:00:00 0
2014-07-02 07:00:00 0
2014-07-02 08:00:00 0
2014-07-02 09:00:00 0
2014-07-02 10:00:00 0
2014-07-02 11:00:00 0
2014-07-02 12:00:00 0
2014-07-02 13:00:00 0
2014-07-02 14:00:00 0
2014-07-02 15:00:00 5
有效!
免责声明:这是我第一次在 R 中玩时间序列,所以可能有其他(即更好的)方法来实现这一点。