R:如何固定时间序列数据集中的日期时间总是return指定周的数据?

R: How to fix the datetime to always return a specified week's data in a time series data set?

我正在处理时间序列传感器数据。 这是我们的流程:有 3 列(EditDate、ID、InsertDate)

EditDate: This is date when the sensor data is edited/modified for that week 
ID: A manufacturing tool identifier
InsertDate: This is the date when all the sensor information will be added to the data frame at once for that week

我们每周五 6:30 am (InsertDate) 添加数据。我的问题是在过去 7 天的数据中找到异常值(注意:原始数据框也包含前几周的数据)。当我正确地实现异常值函数时,我弄乱了日期,这正是我需要帮助的地方。

例如考虑这个数据框

EditDate <- c("04/17/2015 5:46:23 AM", "04/17/2015 5:23:23 AM","04/16/2015 9:46:34 AM","04/15/2015 23:46:11AM","04/11/2015 11:46:17 AM","04/10/2015 6:34:23 AM","04/10/2015 6:29:34 AM","04/8/2015  5:46:12 AM","04/5/2015  5:46:22 AM","04/3/2015  6:31:22 AM","04/3/2015  6:29:23 AM")
ID <- c("DX154", "DX156","DX157","DX159","DX132,"DX137","DX111","DX123","DX136","DX051","DX021")
InsertDate <- c("4/17/2015 6:30:00 AM", "4/17/2015 6:30:00 AM","4/17/2015 6:30:00 AM","4/17/2015 6:30:00 AM","4/17/2015 6:30:00 AM","4/17/2015 6:30:00 AM","4/10/2015 6:30:00 AM","4/10/2015 6:30:00 AM","4/10/2015 6:30:00 AM","4/10/2015 6:30:00 AM","4/3/2015  6:30:00 AM")

df1 <- data.frame(EditDate , ID, InsertDate)

输出

+------------------------+-------+----------------------+
|        EditDate        |  ID   |      InsertDate      |
+------------------------+-------+----------------------+
| 04/17/2015 5:46:23 AM  | DX154 | 4/17/2015 6:30:00 AM |
| 04/17/2015 5:23:23 AM  | DX156 | 4/17/2015 6:30:00 AM |
| 04/16/2015 9:46:34 AM  | DX157 | 4/17/2015 6:30:00 AM |
| 04/15/2015 23:46:11AM  | DX159 | 4/17/2015 6:30:00 AM |
| 04/11/2015 11:46:17 AM | DX132 | 4/17/2015 6:30:00 AM |
| 04/10/2015 6:34:23 AM  | DX137 | 4/17/2015 6:30:00 AM |
| 04/10/2015 6:29:34 AM  | DX111 | 4/10/2015 6:30:00 AM |
| 04/8/2015  5:46:12 AM  | DX123 | 4/10/2015 6:30:00 AM |
| 04/5/2015  5:46:22 AM  | DX123 | 4/10/2015 6:30:00 AM |
| 04/3/2015  6:31:22 AM  | DX123 | 4/10/2015 6:30:00 AM |
| 04/3/2015  6:29:23 AM  | DX123 | 4/3/2015  6:30:00 AM |
+------------------------+-------+----------------------+

有了数据框后,我要做的是

BackAWeek <-Sys.time() - (604800*2) #604800 is a week in seconds
df2 <- subset(df1, df1$EditDate<BackAWeek)
df3 <- subset(df1, df1$EditDate>BackAWeek)

df2 包含最近 7 天的数据,df3 应该包含不属于上周的所有数据。这个意义上的周是根据插入日期计算的,即(例如:假设我们有 4 周的数据。df2 应该 return 第 3 周星期五 6:30:00 AM 的所有数据至第 4 周星期五 6:29:59 上午)。

我当前的脚本要求我在每个星期五 6:31:00 上午 运行 获取过去 7 天的数据,这不可能每次都实现。假设当我 运行 脚本在下周的中间(比如,星期三 (4/22/15))查看数据时,我的脚本采用当前时间并减去 7 天,因此我错过了2015 年 4 月 15 日之前输入的任何数据。

如果我 运行 2015 年 4 月 22 日的脚本,我将获得的数据框是

EditDate                 ID      InsertDate
04/17/2015 5:46:23 AM   DX154   4/17/2015 6:30:00 AM
04/17/2015 5:23:23 AM   DX156   4/17/2015 6:30:00 AM
04/16/2015 9:46:34 AM   DX157   4/17/2015 6:30:00 AM
04/15/2015 23:46:11AM   DX159   4/17/2015 6:30:00 AM

但想要的是

EditDate                 ID     InsertDate
04/17/2015 5:46:23 AM   DX154   4/17/2015 6:30:00 AM
04/17/2015 5:23:23 AM   DX156   4/17/2015 6:30:00 AM
04/16/2015 9:46:34 AM   DX157   4/17/2015 6:30:00 AM
04/15/2015 23:46:11AM   DX159   4/17/2015 6:30:00 AM
04/11/2015 11:46:17 AM  DX132   4/17/2015 6:30:00 AM
04/10/2015 6:34:23 AM   DX137   4/17/2015 6:30:00 AM

请提供有关如何修复我的代码以始终考虑星期五 - 星期五 6:30 的意见,无论我每周 运行 什么时间。

您可能应该使用标准化的日期字符串,然后将它们转换为 R 可以使用的实际时间数据类型。本教程可以帮助您解决这个问题:http://www.cyclismo.org/tutorial/R/time.html

您的问题的解决方案是查找上周五和之前的周五,然后仅使用符合条件的数据。 例如:

首先,创建数据框。请注意日期戳中小时数的前导“0”:

EditDate <- c("03/10/2015 06:30:00 AM","04/17/2015 05:46:23 AM", "04/17/2015 05:23:23 AM","04/16/2015 09:46:34 AM","04/15/2015 08:46:11 AM","04/11/2015 11:46:17 AM","04/10/2015 06:34:23 AM","04/10/2015 06:29:34 AM","04/8/2015 05:46:12 AM","04/5/2015  05:46:22 AM","04/3/2015 06:31:22 AM","04/3/2015 06:29:23 AM")
ID <- c("DX153","DX154", "DX156","DX157","DX159","DX132","DX137","DX111","DX123","DX136","DX051","DX021")
InsertDate <- c("03/10/2015 06:30:00 AM", "04/17/2015 06:30:00 AM", "04/17/2015 6:30:00 AM","04/17/2015 06:30:00 AM","04/17/2015 06:30:00 AM","04/17/2015 06:30:00 AM","04/17/2015 06:30:00 AM","04/10/2015 06:30:00 AM","04/10/2015 06:30:00 AM","4/10/2015 06:30:00 AM","04/10/2015 06:30:00 AM","04/3/2015 06:30:00 AM")
df1 <- data.frame(EditDate,ID,InsertDate)

数据框中的日期格式现在被解释为整数:

> typeof(df1$EditDate[1])
[1] "integer"
> typeof(df1$EditDate[1])
[1] "integer"

您可以使用 strptime 函数将日期转换为日期格式,以获得 POSIXlt 数据类型。在这里,我们将使用 lapply:

一次性完成这两个字段
df1[,c("EditDate","InsertDate")] <- lapply(df1[,c("EditDate","InsertDate")],strptime,format="%m/%d/%Y %I:%M:%S %p")

我们得到:

> typeof(df1$EditDate[1])
[1] "list"

如您所见,您输入的原始字符串现在已转换为列表。此列表对应于 POSIXlt 数据类型。例如:

> df1$EditDate[1]$hour
[1] 6
> df1$EditDate[1]$min
[1] 30
> df1$EditDate[1]$sec
[1] 0

现在,要查找最近的星期五,我们可以从当前时间戳开始并执行一些操作以获得您需要的日期:

lastFriday <- as.POSIXlt(Sys.time())
lastFriday$mday <- lastFriday$mday + (lastFriday$wday-(lastFriday$wday+2))
lastFriday$hour = "6"
lastFriday$min = "30"
lastFriday$sec = "0"

要找到前一周的星期五,我们只需从获得的时间戳中减去 7 天即可。请注意,内置的 POSIXlt 数据类型使用户可以轻松完成此操作并处理基础 date/time 逻辑。

fridayBefore <- lastFriday
fridayBefore$mday <- fridayBefore$mday-7

我们得到:

> lastFriday
[1] "2015-04-17 06:30:00 CEST"
> fridayBefore
[1] "2015-04-10 06:30:00 CEST"

我们唯一还需要做的就是获取数据帧中包含具有相关时间戳的数据的部分。例如,我们可以在逻辑上 select 我们需要的行

logicalVector <- (df1$InsertDate <= lastFriday & df1$InsertDate >= fridayBefore)
results <- df1[logicalVector,]

我们得到:

> results
              EditDate    ID          InsertDate
2  2015-04-17 05:46:23 DX154 2015-04-17 06:30:00
3  2015-04-17 05:23:23 DX156 2015-04-17 06:30:00
4  2015-04-16 09:46:34 DX157 2015-04-17 06:30:00
5  2015-04-15 08:46:11 DX159 2015-04-17 06:30:00
6  2015-04-11 11:46:17 DX132 2015-04-17 06:30:00
7  2015-04-10 06:34:23 DX137 2015-04-17 06:30:00
8  2015-04-10 06:29:34 DX111 2015-04-10 06:30:00
9  2015-04-08 05:46:12 DX123 2015-04-10 06:30:00
10 2015-04-05 05:46:22 DX136 2015-04-10 06:30:00
11 2015-04-03 06:31:22 DX051 2015-04-10 06:30:00