在R中以顺序方式从具有最大值的列中减去
subtract from column with the maximum values in sequential manner in R
我仍在学习如何在 R 中执行循环和 if-else 语句。我可以用手写方法完成该过程,但我将在大型数据集中实现它们,因此我需要在 loops/if-else.
我的数据看起来有点像下面的示例数据框。其中一列包含行内最大值的列号:
x1 x2 x3 x4 x5 x6 x7 max_index max_val
1 56.1 56.8 99.4 44.6 50.4 74.9 17.7 3 99.4
2 9.1 46.1 74.2 64.3 62.3 68.8 85.7 7 85.7
3 83.3 84.5 18.4 93.2 17.6 69.7 23.4 4 93.2
4 94.0 9.7 46.8 25.0 96.9 69.2 94.8 5 96.9
5 21.5 64.1 89.1 87.7 59.7 88.0 73.5 3 89.1
6 53.0 94.9 87.2 19.6 55.9 48.5 82.9 2 94.9
7 52.2 79.1 20.6 9.9 18.3 21.5 92.5 7 92.5
8 42.5 33.0 36.9 45.0 43.9 7.6 45.3 7 45.3
9 89.3 20.6 41.7 74.8 67.4 21.0 49.1 1 89.3
10 21.2 92.6 86.3 76.3 68.6 44.8 8.8 2 92.6
我想要做的是像这样从彼此中减去 3 个连续的列(从最大值中):
j1 <- max.col(df[,1:7], "first")
df$max_index <- j1
df$max_val <- df[cbind(1:nrow(df), j1)]
i1 <- j1 + 1
i2 <- i1 + 1
i3 <- i2 +1
value <- df[cbind(1:nrow(df), j1)]
value1 <- df[cbind(1:nrow(df), i1)]
value2 <- df[cbind(1:nrow(df), i2)]
value3 <- df[cbind(1:nrow(df), i3)]
df$max_val <- value
df$max.up1 <- value1
df$max.up2 <- value2
df$max.up3 <- value3
df_x1 <- df$max_val - df$max.up1
df_x2 <- df$max.up1 - df$max.up2
df_x3 <- df$max.up2 - df$max.up3
这样做之后,我想知道所有 3 个输出(df_x1、df_x2、df_x3)是否都是正数,并添加一个显示 [=25= 的列] 和 "FALSE" 如果不是。
我希望我的最终数据框看起来像这样:
x1 x2 x3 x4 x5 x6 x7 max_index max_val t.or.f
1 56.1 56.8 99.4 44.6 50.4 74.9 17.7 3 99.4 FALSE
2 9.1 46.1 74.2 64.3 62.3 68.8 85.7 7 85.7 NA
3 83.3 84.5 18.4 93.2 17.6 69.7 23.4 4 93.2 FALSE
4 94.0 9.7 46.8 25.0 96.9 69.2 94.8 5 96.9 NA
5 21.5 64.1 89.1 87.7 59.7 88.0 73.5 3 89.1 FALSE
6 53.0 94.9 87.2 19.6 55.9 48.5 82.9 2 94.9 FALSE
7 52.2 79.1 20.6 9.9 18.3 21.5 92.5 7 92.5 FALSE
8 42.5 33.0 36.9 45.0 43.9 7.6 45.3 7 45.3 FALSE
9 89.3 20.6 41.7 74.8 67.4 21.0 49.1 1 89.3 FALSE
10 21.2 92.6 86.3 76.3 68.6 44.8 8.8 2 92.6 TRUE
我将如何简化我的代码?谢谢!
我这里是 data.table
结构化数据方法的解决方案:
library(data.table)
dt.m <- read.table(text = "
x1 x2 x3 x4 x5 x6 x7 max_index max_val
1 56.1 56.8 99.4 44.6 50.4 74.9 17.7 3 99.4
2 9.1 46.1 74.2 64.3 62.3 68.8 85.7 7 85.7
3 83.3 84.5 18.4 93.2 17.6 69.7 23.4 4 93.2
4 94.0 9.7 46.8 25.0 96.9 69.2 94.8 5 96.9
5 21.5 64.1 89.1 87.7 59.7 88.0 73.5 3 89.1
6 53.0 94.9 87.2 19.6 55.9 48.5 82.9 2 94.9
7 52.2 79.1 20.6 9.9 18.3 21.5 92.5 7 92.5
8 42.5 33.0 36.9 45.0 43.9 7.6 45.3 7 45.3
9 89.3 20.6 41.7 74.8 67.4 21.0 49.1 1 89.3
10 21.2 92.6 86.3 76.3 68.6 44.8 8.8 2 92.6", header = TRUE)
dt.m <- data.table(dt.m)
dt.m[, row.id := 1:.N]
# melt data to make it easy to work with, excluding max.val and max.index
dt <- melt(data = dt.m, measure.vars = 1:7, id.vars = "row.id")
# replicate max.val and max.index which are already provided in example
dt[, max.val := max(value), by = row.id]
dt[, max.index := which(value == max.val), by = row.id]
dt[, x.index := 1:.N, by = row.id]
# filter to values after the max value
out <- dt[x.index >= max.index]
# keep max value and 3 values post max value
out <- out[, post.max.index := 1:.N, by = row.id][post.max.index <= 4]
out <- out[order(row.id, x.index)]
out[, previous.x := shift(value)]
out[, change.x := previous.x - value]
out <- out[max.index != x.index]
# check if all values are positive
res <- out[, .(all.next.positive = all(change.x > 0)), by = row.id]
# add result to the original data
dt.m <- merge(dt.m, res, by = "row.id", all.x = TRUE)
我仍在学习如何在 R 中执行循环和 if-else 语句。我可以用手写方法完成该过程,但我将在大型数据集中实现它们,因此我需要在 loops/if-else.
我的数据看起来有点像下面的示例数据框。其中一列包含行内最大值的列号:
x1 x2 x3 x4 x5 x6 x7 max_index max_val
1 56.1 56.8 99.4 44.6 50.4 74.9 17.7 3 99.4
2 9.1 46.1 74.2 64.3 62.3 68.8 85.7 7 85.7
3 83.3 84.5 18.4 93.2 17.6 69.7 23.4 4 93.2
4 94.0 9.7 46.8 25.0 96.9 69.2 94.8 5 96.9
5 21.5 64.1 89.1 87.7 59.7 88.0 73.5 3 89.1
6 53.0 94.9 87.2 19.6 55.9 48.5 82.9 2 94.9
7 52.2 79.1 20.6 9.9 18.3 21.5 92.5 7 92.5
8 42.5 33.0 36.9 45.0 43.9 7.6 45.3 7 45.3
9 89.3 20.6 41.7 74.8 67.4 21.0 49.1 1 89.3
10 21.2 92.6 86.3 76.3 68.6 44.8 8.8 2 92.6
我想要做的是像这样从彼此中减去 3 个连续的列(从最大值中):
j1 <- max.col(df[,1:7], "first")
df$max_index <- j1
df$max_val <- df[cbind(1:nrow(df), j1)]
i1 <- j1 + 1
i2 <- i1 + 1
i3 <- i2 +1
value <- df[cbind(1:nrow(df), j1)]
value1 <- df[cbind(1:nrow(df), i1)]
value2 <- df[cbind(1:nrow(df), i2)]
value3 <- df[cbind(1:nrow(df), i3)]
df$max_val <- value
df$max.up1 <- value1
df$max.up2 <- value2
df$max.up3 <- value3
df_x1 <- df$max_val - df$max.up1
df_x2 <- df$max.up1 - df$max.up2
df_x3 <- df$max.up2 - df$max.up3
这样做之后,我想知道所有 3 个输出(df_x1、df_x2、df_x3)是否都是正数,并添加一个显示 [=25= 的列] 和 "FALSE" 如果不是。
我希望我的最终数据框看起来像这样:
x1 x2 x3 x4 x5 x6 x7 max_index max_val t.or.f
1 56.1 56.8 99.4 44.6 50.4 74.9 17.7 3 99.4 FALSE
2 9.1 46.1 74.2 64.3 62.3 68.8 85.7 7 85.7 NA
3 83.3 84.5 18.4 93.2 17.6 69.7 23.4 4 93.2 FALSE
4 94.0 9.7 46.8 25.0 96.9 69.2 94.8 5 96.9 NA
5 21.5 64.1 89.1 87.7 59.7 88.0 73.5 3 89.1 FALSE
6 53.0 94.9 87.2 19.6 55.9 48.5 82.9 2 94.9 FALSE
7 52.2 79.1 20.6 9.9 18.3 21.5 92.5 7 92.5 FALSE
8 42.5 33.0 36.9 45.0 43.9 7.6 45.3 7 45.3 FALSE
9 89.3 20.6 41.7 74.8 67.4 21.0 49.1 1 89.3 FALSE
10 21.2 92.6 86.3 76.3 68.6 44.8 8.8 2 92.6 TRUE
我将如何简化我的代码?谢谢!
我这里是 data.table
结构化数据方法的解决方案:
library(data.table)
dt.m <- read.table(text = "
x1 x2 x3 x4 x5 x6 x7 max_index max_val
1 56.1 56.8 99.4 44.6 50.4 74.9 17.7 3 99.4
2 9.1 46.1 74.2 64.3 62.3 68.8 85.7 7 85.7
3 83.3 84.5 18.4 93.2 17.6 69.7 23.4 4 93.2
4 94.0 9.7 46.8 25.0 96.9 69.2 94.8 5 96.9
5 21.5 64.1 89.1 87.7 59.7 88.0 73.5 3 89.1
6 53.0 94.9 87.2 19.6 55.9 48.5 82.9 2 94.9
7 52.2 79.1 20.6 9.9 18.3 21.5 92.5 7 92.5
8 42.5 33.0 36.9 45.0 43.9 7.6 45.3 7 45.3
9 89.3 20.6 41.7 74.8 67.4 21.0 49.1 1 89.3
10 21.2 92.6 86.3 76.3 68.6 44.8 8.8 2 92.6", header = TRUE)
dt.m <- data.table(dt.m)
dt.m[, row.id := 1:.N]
# melt data to make it easy to work with, excluding max.val and max.index
dt <- melt(data = dt.m, measure.vars = 1:7, id.vars = "row.id")
# replicate max.val and max.index which are already provided in example
dt[, max.val := max(value), by = row.id]
dt[, max.index := which(value == max.val), by = row.id]
dt[, x.index := 1:.N, by = row.id]
# filter to values after the max value
out <- dt[x.index >= max.index]
# keep max value and 3 values post max value
out <- out[, post.max.index := 1:.N, by = row.id][post.max.index <= 4]
out <- out[order(row.id, x.index)]
out[, previous.x := shift(value)]
out[, change.x := previous.x - value]
out <- out[max.index != x.index]
# check if all values are positive
res <- out[, .(all.next.positive = all(change.x > 0)), by = row.id]
# add result to the original data
dt.m <- merge(dt.m, res, by = "row.id", all.x = TRUE)