在 R 中将每日数据转换为每周数据 每周从星期六开始
Convert Daily Data into Weekly in R Week Starts on Saturday
我无法使用一周内的平均值将每日数据转换为每周数据。
我的数据是这样的:
> str(daily_FWIH)
'data.frame': 4371 obs. of 6 variables:
$ Date : Date, format: "2013-03-01" "2013-03-02" "2013-03-04" "2013-03-05" ...
$ CST.OUC : Factor w/ 6 levels "BVG11","BVG12",..: 1 1 1 1 1 1 1 1 1 1 ...
$ CST.NAME : Factor w/ 6 levels "Central Scotland",..: 2 2 2 2 2 2 2 2 2 2 ...
$ SOM_patch: Factor w/ 6 levels "BVG11_Highlands & Islands",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Row_Desc : Factor w/ 1 level "FSFluidWIH": 1 1 1 1 1 1 1 1 1 1 ...
$ Value : num 1.16 1.99 1.47 1.15 1.16 1.28 1.27 2.07 1.26 1.19 ...
> head(daily_FWIH)
Date CST.OUC CST.NAME SOM_patch Row_Desc Value
1 2013-03-01 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.16
2 2013-03-02 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.99
3 2013-03-04 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.47
4 2013-03-05 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.15
5 2013-03-06 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.16
6 2013-03-07 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.28
这就是我尝试将其转换为 xts 对象的方法,如图所示 here
这是我试过的:
daily_FWIH$Date = as.Date(as.character(daily_FWIH$Date), "%d/%m/%Y")
library(xts)
temp.x = xts(daily_FWIH[-1], order.by=daily_FWIH$Date)
apply.weekly(temp.x, colMeans(temp.x$Value))
我有两个问题。我的一周以 "Saturday" 开始和结束,我收到以下错误:
> apply.weekly(temp.x, colMeans(temp.x$Value))
Error in colMeans(temp.x$Value) : 'x' must be numeric
更新 根据山姆的评论:
这是我所做的:
daily_FWIH$Date <- ymd(daily_FWIH$Date) # convert to POSIX format
daily_FWIH$fakeDate <- daily_FWIH$Date + days(2)
daily_FWIH$week <- week(daily_FWIH$fakeDate) # extract week value
daily_FWIH$year <- year(daily_FWIH$fakeDate)
> daily_FWIH %>%
+ group_by(year,week) %>%
+ mutate(weeklyAvg = mean(Value), weekStartsOn = min(Date)) %>% # create the average variable
+ slice(which(Date == weekStartsOn)) %>% # select just the first record of the week - other vars will come from this
+ select(-Value,-fakeDate,-week,-year,-Date, -CST.OUC,-CST.NAME) # drop unneeded variables
Source: local data frame [631 x 6]
Groups: year, week
year week SOM_patch Row_Desc weeklyAvg weekStartsOn
1 2013 9 BVG11_Highlands & Islands FSFluidWIH 1.048333 2013-03-01
2 2013 9 BVG12_North East Scotland FSFluidWIH 1.048333 2013-03-01
3 2013 9 BVG13_Central Scotland FSFluidWIH 1.048333 2013-03-01
4 2013 9 BVG14_South East Scotland FSFluidWIH 1.048333 2013-03-01
5 2013 9 BVG15_West Central Scotland FSFluidWIH 1.048333 2013-03-01
6 2013 9 BVG16_South West Scotland FSFluidWIH 1.048333 2013-03-01
7 2013 10 BVG11_Highlands & Islands FSFluidWIH 1.520500 2013-03-02
8 2013 10 BVG12_North East Scotland FSFluidWIH 1.520500 2013-03-02
9 2013 10 BVG13_Central Scotland FSFluidWIH 1.520500 2013-03-02
10 2013 10 BVG14_South East Scotland FSFluidWIH 1.520500 2013-03-02
.. ... ... ... ... ... ...
这是不正确的...
期望的输出是:
> head(desired)
Date BVG11.Highlands_I_.A_pct BVG12.North.East.ScotlandA_pct BVG13.Central.ScotlandA_pct
1 01/03/2013 1.16 1.13 1.08
2 08/03/2013 1.41 2.37 1.80
3 15/03/2013 1.33 3.31 1.34
4 22/03/2013 1.39 2.49 1.62
5 29/03/2013 5.06 3.42 1.42
6 NA NA NA
BVG14.South.East.ScotlandA_pct BVG15.West.Central.ScotlandA_pct BVG16.South.West.ScotlandA_pct
1 1.05 0.98 0.89
2 1.51 1.21 1.07
3 1.13 2.13 2.01
4 2.14 1.24 1.37
5 1.62 1.46 1.95
6 NA NA NA
> str(desired)
'data.frame': 11 obs. of 7 variables:
$ Date : Factor w/ 6 levels "01/03/2013",..: 2 3 4 5 6 1 1 1 1 1 ...
$ BVG11.Highlands_I_.A_pct : num 1.16 1.41 1.33 1.39 5.06 ...
$ BVG12.North.East.ScotlandA_pct : num 1.13 2.37 3.31 2.49 3.42 ...
$ BVG13.Central.ScotlandA_pct : num 1.08 1.8 1.34 1.62 1.42 ...
$ BVG14.South.East.ScotlandA_pct : num 1.05 1.51 1.13 2.14 1.62 ...
$ BVG15.West.Central.ScotlandA_pct: num 0.98 1.21 2.13 1.24 1.46 ...
$ BVG16.South.West.ScotlandA_pct : num 0.89 1.07 2.01 1.37 1.95 ...
找到数据中的第一个星期六,然后根据它为数据集中的所有日期分配一个周 ID:
library(lubridate) # for the wday() and ymd() functions
daily_FWIH$Date <- ymd(daily_FWIH$Date)
saturdays <- daily_FWIH[wday(daily_FWIH$Date) == 7, ] # filter for Saturdays
startDate <- min(saturdays$Date) # select first Saturday
daily_FWIH$week <- floor(as.numeric(difftime(daily_FWIH$Date, startDate, units = "weeks")))
一旦你有了 weekID-starting-on-Saturday 变量,这就是一个标准的 R 问题。您可以使用 calculating means within a subgroup 选择的方法计算每周平均值。我喜欢 dplyr
:
library(dplyr)
daily_FWIH %>%
group_by(week, SOM_patch) %>% # use your grouping variables in addition to week
summarise(weeklyAvg = mean(Value), weekBeginDate = min(Date)) %>%
mutate(firstDayOfWeek = wday(weekBeginDate, label=TRUE)) # confirm correct week cuts
Source: local data frame [2 x 5]
Groups: week
week SOM_patch weeklyAvg weekBeginDate firstDayOfWeek
1 -1 BVG11_Highlands & Islands 1.16 2013-03-01 Fri
2 0 BVG11_Highlands & Islands 1.41 2013-03-02 Sat
根据以下评论更新:
如果您想查看数据集中的其他值,您需要决定如何 select 或在一周内的每日值发生冲突时计算每周值。在您的示例数据中,它们在所有行中都相同,因此我只是从包含一周第一天的行中绘制它们。
library(dplyr)
daily_FWIH %>%
group_by(week, SOM_patch) %>% # use your grouping variables
mutate(weeklyAvg = mean(Value), weekBeginDate = min(Date)) %>%
slice(which(Date == weekBeginDate)) %>% # select just the first record of the week - other vars will come from this
select(-Value, -Date) # drop unneeded variables
Source: local data frame [2 x 7]
Groups: week, SOM_patch
CST.OUC CST.NAME SOM_patch Row_Desc week weeklyAvg weekBeginDate
1 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH -1 1.16 2013-03-01
2 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 0 1.41 2013-03-02
我无法使用一周内的平均值将每日数据转换为每周数据。
我的数据是这样的:
> str(daily_FWIH)
'data.frame': 4371 obs. of 6 variables:
$ Date : Date, format: "2013-03-01" "2013-03-02" "2013-03-04" "2013-03-05" ...
$ CST.OUC : Factor w/ 6 levels "BVG11","BVG12",..: 1 1 1 1 1 1 1 1 1 1 ...
$ CST.NAME : Factor w/ 6 levels "Central Scotland",..: 2 2 2 2 2 2 2 2 2 2 ...
$ SOM_patch: Factor w/ 6 levels "BVG11_Highlands & Islands",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Row_Desc : Factor w/ 1 level "FSFluidWIH": 1 1 1 1 1 1 1 1 1 1 ...
$ Value : num 1.16 1.99 1.47 1.15 1.16 1.28 1.27 2.07 1.26 1.19 ...
> head(daily_FWIH)
Date CST.OUC CST.NAME SOM_patch Row_Desc Value
1 2013-03-01 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.16
2 2013-03-02 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.99
3 2013-03-04 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.47
4 2013-03-05 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.15
5 2013-03-06 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.16
6 2013-03-07 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.28
这就是我尝试将其转换为 xts 对象的方法,如图所示 here
这是我试过的:
daily_FWIH$Date = as.Date(as.character(daily_FWIH$Date), "%d/%m/%Y")
library(xts)
temp.x = xts(daily_FWIH[-1], order.by=daily_FWIH$Date)
apply.weekly(temp.x, colMeans(temp.x$Value))
我有两个问题。我的一周以 "Saturday" 开始和结束,我收到以下错误:
> apply.weekly(temp.x, colMeans(temp.x$Value))
Error in colMeans(temp.x$Value) : 'x' must be numeric
更新 根据山姆的评论:
这是我所做的:
daily_FWIH$Date <- ymd(daily_FWIH$Date) # convert to POSIX format
daily_FWIH$fakeDate <- daily_FWIH$Date + days(2)
daily_FWIH$week <- week(daily_FWIH$fakeDate) # extract week value
daily_FWIH$year <- year(daily_FWIH$fakeDate)
> daily_FWIH %>%
+ group_by(year,week) %>%
+ mutate(weeklyAvg = mean(Value), weekStartsOn = min(Date)) %>% # create the average variable
+ slice(which(Date == weekStartsOn)) %>% # select just the first record of the week - other vars will come from this
+ select(-Value,-fakeDate,-week,-year,-Date, -CST.OUC,-CST.NAME) # drop unneeded variables
Source: local data frame [631 x 6]
Groups: year, week
year week SOM_patch Row_Desc weeklyAvg weekStartsOn
1 2013 9 BVG11_Highlands & Islands FSFluidWIH 1.048333 2013-03-01
2 2013 9 BVG12_North East Scotland FSFluidWIH 1.048333 2013-03-01
3 2013 9 BVG13_Central Scotland FSFluidWIH 1.048333 2013-03-01
4 2013 9 BVG14_South East Scotland FSFluidWIH 1.048333 2013-03-01
5 2013 9 BVG15_West Central Scotland FSFluidWIH 1.048333 2013-03-01
6 2013 9 BVG16_South West Scotland FSFluidWIH 1.048333 2013-03-01
7 2013 10 BVG11_Highlands & Islands FSFluidWIH 1.520500 2013-03-02
8 2013 10 BVG12_North East Scotland FSFluidWIH 1.520500 2013-03-02
9 2013 10 BVG13_Central Scotland FSFluidWIH 1.520500 2013-03-02
10 2013 10 BVG14_South East Scotland FSFluidWIH 1.520500 2013-03-02
.. ... ... ... ... ... ...
这是不正确的...
期望的输出是:
> head(desired)
Date BVG11.Highlands_I_.A_pct BVG12.North.East.ScotlandA_pct BVG13.Central.ScotlandA_pct
1 01/03/2013 1.16 1.13 1.08
2 08/03/2013 1.41 2.37 1.80
3 15/03/2013 1.33 3.31 1.34
4 22/03/2013 1.39 2.49 1.62
5 29/03/2013 5.06 3.42 1.42
6 NA NA NA
BVG14.South.East.ScotlandA_pct BVG15.West.Central.ScotlandA_pct BVG16.South.West.ScotlandA_pct
1 1.05 0.98 0.89
2 1.51 1.21 1.07
3 1.13 2.13 2.01
4 2.14 1.24 1.37
5 1.62 1.46 1.95
6 NA NA NA
> str(desired)
'data.frame': 11 obs. of 7 variables:
$ Date : Factor w/ 6 levels "01/03/2013",..: 2 3 4 5 6 1 1 1 1 1 ...
$ BVG11.Highlands_I_.A_pct : num 1.16 1.41 1.33 1.39 5.06 ...
$ BVG12.North.East.ScotlandA_pct : num 1.13 2.37 3.31 2.49 3.42 ...
$ BVG13.Central.ScotlandA_pct : num 1.08 1.8 1.34 1.62 1.42 ...
$ BVG14.South.East.ScotlandA_pct : num 1.05 1.51 1.13 2.14 1.62 ...
$ BVG15.West.Central.ScotlandA_pct: num 0.98 1.21 2.13 1.24 1.46 ...
$ BVG16.South.West.ScotlandA_pct : num 0.89 1.07 2.01 1.37 1.95 ...
找到数据中的第一个星期六,然后根据它为数据集中的所有日期分配一个周 ID:
library(lubridate) # for the wday() and ymd() functions
daily_FWIH$Date <- ymd(daily_FWIH$Date)
saturdays <- daily_FWIH[wday(daily_FWIH$Date) == 7, ] # filter for Saturdays
startDate <- min(saturdays$Date) # select first Saturday
daily_FWIH$week <- floor(as.numeric(difftime(daily_FWIH$Date, startDate, units = "weeks")))
一旦你有了 weekID-starting-on-Saturday 变量,这就是一个标准的 R 问题。您可以使用 calculating means within a subgroup 选择的方法计算每周平均值。我喜欢 dplyr
:
library(dplyr)
daily_FWIH %>%
group_by(week, SOM_patch) %>% # use your grouping variables in addition to week
summarise(weeklyAvg = mean(Value), weekBeginDate = min(Date)) %>%
mutate(firstDayOfWeek = wday(weekBeginDate, label=TRUE)) # confirm correct week cuts
Source: local data frame [2 x 5]
Groups: week
week SOM_patch weeklyAvg weekBeginDate firstDayOfWeek
1 -1 BVG11_Highlands & Islands 1.16 2013-03-01 Fri
2 0 BVG11_Highlands & Islands 1.41 2013-03-02 Sat
根据以下评论更新:
如果您想查看数据集中的其他值,您需要决定如何 select 或在一周内的每日值发生冲突时计算每周值。在您的示例数据中,它们在所有行中都相同,因此我只是从包含一周第一天的行中绘制它们。
library(dplyr)
daily_FWIH %>%
group_by(week, SOM_patch) %>% # use your grouping variables
mutate(weeklyAvg = mean(Value), weekBeginDate = min(Date)) %>%
slice(which(Date == weekBeginDate)) %>% # select just the first record of the week - other vars will come from this
select(-Value, -Date) # drop unneeded variables
Source: local data frame [2 x 7]
Groups: week, SOM_patch
CST.OUC CST.NAME SOM_patch Row_Desc week weeklyAvg weekBeginDate
1 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH -1 1.16 2013-03-01
2 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 0 1.41 2013-03-02