无法找到一种方法来降低最后的价值
Can't find a way to carry the last good value down
这有点难以描述,但我会试一试。
假设我有以下动物园对象:
a <- read.zoo(data.frame(date=as.Date('2011-1-1') + 0:59, closest.idx=c(rep(1,20), rep(2, 20), rep(3, 20)), is.good=c(rep(1,20), rep(1,20), rep(0, 20)), val=c(rep(.2, 6), rep(.3, 14), rep(.4, 6), rep(.5, 14), rep(.6, 6), rep(.7, 14))), FUN = as.Date)
closest.idx is.good val
2011-01-01 1 1 0.2
2011-01-02 1 1 0.2
2011-01-03 1 1 0.2
2011-01-04 1 1 0.2
2011-01-05 1 1 0.2
2011-01-06 1 1 0.2
2011-01-07 1 1 0.3
2011-01-08 1 1 0.3
2011-01-09 1 1 0.3
2011-01-10 1 1 0.3
...
我要把最后一个好的“val”背下来。规则如下:
- 无论 is.good 有什么值,每组的前 6 行都不应更改
- 如果 is.good = 0,则下一行将更改。如果 is.good = 0,则 val 将更改为 last.good.val)
- 最后一个好的 val 是 is.good = 1 并且出现在该组的第 7 行或更多行
NOTE #1: Don't assume there will be a total of 20 rows in a group - it could be any number
NOTE #2: You can assume that the first 6 rows of each group shouldn't be touched
所以在这个例子中,
2011-01-01 - 2011-01-06 will have a val of 0.2 (is.good = 1, < 6 rows into group so not last.good.val)
2011-01-07 - 2011-01-20 will have a val of 0.3 (is.good = 1, last.good.val = 0.3)
2011-01-21 - 2011-01-26 will have a val of 0.4 (is.good = 1, last.good.val = 0.3, < 6 rows into group so not last.good.val)
2011-01-27 - 2011-02-09 will have a val of 0.5 (is.good = 1, last.good.val = 0.5)
2011-02-10 - 2011-02-15 will have a val of 0.6 (b/c they are < 6 rows into the group so aren't affected)
2011-02-16 - 2011-03-01 will have a val of 0.5 (b/c 0.5 was the last good value and is.good = 0 in this group)
所以我希望我的输出看起来像这样:
closestIdx is.good val
2011-01-01 1 1 0.2
2011-01-02 1 1 0.2
2011-01-03 1 1 0.2
2011-01-04 1 1 0.2
2011-01-05 1 1 0.2
2011-01-06 1 1 0.2
2011-01-07 1 1 0.3
2011-01-08 1 1 0.3
2011-01-09 1 1 0.3
...
2011-01-21 2 1 0.4
2011-01-22 2 1 0.4
2011-01-23 2 1 0.4
2011-01-24 2 1 0.4
2011-01-25 2 1 0.4
2011-01-26 2 1 0.4
2011-01-27 2 1 0.5
2011-01-28 2 1 0.5
2011-01-29 2 1 0.5
2011-01-30 2 1 0.5
2011-01-31 2 1 0.5
...
2011-02-10 3 0 0.6
2011-02-11 3 0 0.6
2011-02-12 3 0 0.6
2011-02-13 3 0 0.6
2011-02-14 3 0 0.6
2011-02-15 3 0 0.6
2011-02-16 3 0 0.5 <- notice these changed to last good value
2011-02-17 3 0 0.5
2011-02-18 3 0 0.5
...
NOTE: I would prefer a base-R solution but other packages would be
interesting to see
这里有几种方法,每种方法的作用基本相同:
- 添加一列
val_tofill
用 NA
的 替换所有 non-good 值
- 使用许多可用方法中的一种来前向填充
val_tofill
,参见例如Replacing NAs with latest non-NA value
- 只要行号不是组的前六个之一(按
closest.idx
分组),就用 val_tofill
覆盖 val
列
初始数据
a <- data.frame(
date=as.Date('2011-1-1') + 0:59,
closest.idx=c(rep(1,20), rep(2, 20), rep(3, 20)),
is.good=c(rep(1,20), rep(1,20), rep(0, 20)),
val=c(rep(.2, 6), rep(.3, 14), rep(.4, 6), rep(.5, 14), rep(.6, 6), rep(.7, 14))
)
base + zoo::na.locf
a$val_tofill <- zoo::na.locf(ifelse(a$is.good > 0, a$val, NA))
a$val <- unlist(
by(a, INDICES = a$closest.idx,
FUN = function(x) ifelse(seq_len(nrow(x)) < 7, x$val, x$val_tofill)
)
)
a$val_tofill <- NULL
dplyr + tidyr::fill
library(tidyverse)
mutate(a, val_tofill = ifelse(is.good > 0, val, NA)) %>%
fill(val_tofill, .direction = "down") %>%
group_by(closest.idx) %>%
mutate(val = ifelse(row_number() < 7, val, val_tofill)) %>%
ungroup() %>%
select(-val_tofill)
data.table + zoo::na.locf
library(data.table)
a <- setDT(a)
a[, val_tofill := zoo::na.locf(ifelse(is.good > 0, val, NA))][,
val := ifelse(seq_len(.N) < 7, val, val_tofill),
by = closest.idx
]
a$val_tofill <- NULL
这有点难以描述,但我会试一试。 假设我有以下动物园对象:
a <- read.zoo(data.frame(date=as.Date('2011-1-1') + 0:59, closest.idx=c(rep(1,20), rep(2, 20), rep(3, 20)), is.good=c(rep(1,20), rep(1,20), rep(0, 20)), val=c(rep(.2, 6), rep(.3, 14), rep(.4, 6), rep(.5, 14), rep(.6, 6), rep(.7, 14))), FUN = as.Date)
closest.idx is.good val
2011-01-01 1 1 0.2
2011-01-02 1 1 0.2
2011-01-03 1 1 0.2
2011-01-04 1 1 0.2
2011-01-05 1 1 0.2
2011-01-06 1 1 0.2
2011-01-07 1 1 0.3
2011-01-08 1 1 0.3
2011-01-09 1 1 0.3
2011-01-10 1 1 0.3
...
我要把最后一个好的“val”背下来。规则如下:
- 无论 is.good 有什么值,每组的前 6 行都不应更改
- 如果 is.good = 0,则下一行将更改。如果 is.good = 0,则 val 将更改为 last.good.val)
- 最后一个好的 val 是 is.good = 1 并且出现在该组的第 7 行或更多行
NOTE #1: Don't assume there will be a total of 20 rows in a group - it could be any number
NOTE #2: You can assume that the first 6 rows of each group shouldn't be touched
所以在这个例子中,
2011-01-01 - 2011-01-06 will have a val of 0.2 (is.good = 1, < 6 rows into group so not last.good.val)
2011-01-07 - 2011-01-20 will have a val of 0.3 (is.good = 1, last.good.val = 0.3)
2011-01-21 - 2011-01-26 will have a val of 0.4 (is.good = 1, last.good.val = 0.3, < 6 rows into group so not last.good.val)
2011-01-27 - 2011-02-09 will have a val of 0.5 (is.good = 1, last.good.val = 0.5)
2011-02-10 - 2011-02-15 will have a val of 0.6 (b/c they are < 6 rows into the group so aren't affected)
2011-02-16 - 2011-03-01 will have a val of 0.5 (b/c 0.5 was the last good value and is.good = 0 in this group)
所以我希望我的输出看起来像这样:
closestIdx is.good val
2011-01-01 1 1 0.2
2011-01-02 1 1 0.2
2011-01-03 1 1 0.2
2011-01-04 1 1 0.2
2011-01-05 1 1 0.2
2011-01-06 1 1 0.2
2011-01-07 1 1 0.3
2011-01-08 1 1 0.3
2011-01-09 1 1 0.3
...
2011-01-21 2 1 0.4
2011-01-22 2 1 0.4
2011-01-23 2 1 0.4
2011-01-24 2 1 0.4
2011-01-25 2 1 0.4
2011-01-26 2 1 0.4
2011-01-27 2 1 0.5
2011-01-28 2 1 0.5
2011-01-29 2 1 0.5
2011-01-30 2 1 0.5
2011-01-31 2 1 0.5
...
2011-02-10 3 0 0.6
2011-02-11 3 0 0.6
2011-02-12 3 0 0.6
2011-02-13 3 0 0.6
2011-02-14 3 0 0.6
2011-02-15 3 0 0.6
2011-02-16 3 0 0.5 <- notice these changed to last good value
2011-02-17 3 0 0.5
2011-02-18 3 0 0.5
...
NOTE: I would prefer a base-R solution but other packages would be interesting to see
这里有几种方法,每种方法的作用基本相同:
- 添加一列
val_tofill
用NA
的 替换所有 non-good 值
- 使用许多可用方法中的一种来前向填充
val_tofill
,参见例如Replacing NAs with latest non-NA value - 只要行号不是组的前六个之一(按
closest.idx
分组),就用val_tofill
覆盖val
列
初始数据
a <- data.frame(
date=as.Date('2011-1-1') + 0:59,
closest.idx=c(rep(1,20), rep(2, 20), rep(3, 20)),
is.good=c(rep(1,20), rep(1,20), rep(0, 20)),
val=c(rep(.2, 6), rep(.3, 14), rep(.4, 6), rep(.5, 14), rep(.6, 6), rep(.7, 14))
)
base + zoo::na.locf
a$val_tofill <- zoo::na.locf(ifelse(a$is.good > 0, a$val, NA))
a$val <- unlist(
by(a, INDICES = a$closest.idx,
FUN = function(x) ifelse(seq_len(nrow(x)) < 7, x$val, x$val_tofill)
)
)
a$val_tofill <- NULL
dplyr + tidyr::fill
library(tidyverse)
mutate(a, val_tofill = ifelse(is.good > 0, val, NA)) %>%
fill(val_tofill, .direction = "down") %>%
group_by(closest.idx) %>%
mutate(val = ifelse(row_number() < 7, val, val_tofill)) %>%
ungroup() %>%
select(-val_tofill)
data.table + zoo::na.locf
library(data.table)
a <- setDT(a)
a[, val_tofill := zoo::na.locf(ifelse(is.good > 0, val, NA))][,
val := ifelse(seq_len(.N) < 7, val, val_tofill),
by = closest.idx
]
a$val_tofill <- NULL