添加新列,其列名的值大于和小于平均值
Add new column with column names with values greater and lower than mean
我有一个数据框:
set.seed(100)
A <- floor(runif(5, min=0, max=10))
B <- floor(runif(5, min=0, max=10))
C <- floor(runif(5, min=0, max=10))
D <- floor(runif(5, min=0, max=10))
df <- data.frame(A,B,C,D)
df$ms <- rowMeans(df)
df
A B C D ms
1 3 4 6 6 4.75
2 2 8 8 2 5.00
3 5 3 2 3 3.25
4 0 5 3 3 2.75
5 4 1 7 6 4.50
现在,当特定行中的值在 A 列和 B 列中低于平均值并且在 C 列和 D 列中也高于平均值时,我想添加带有列名的列(较低和较高)。期望的结果:
A B C D ms lower greater
1 3 4 6 6 4.75 A,B C,D
2 2 8 8 2 5.00 A C
3 5 3 2 3 3.25 B NA
4 0 5 3 3 2.75 A NA
5 4 1 7 6 4.50 A,B C,D
我试图用 which()
来做这个但是我卡住了,你能给我一个提示吗?
lapply(apply(df,1, function(x) which(df$ms)),names)
在 base R 中,我想你可以这样做:
df$lower <- lapply(df[1:2], \(x) x < df$ms) |>
data.frame() |>
apply(1, \(x) paste(names(x)[x], collapse = ","))
df$greater <- lapply(df[3:4], \(x) x > df$ms) |>
data.frame() |>
apply(1, \(x) paste(names(x)[x], collapse = ","))
# Replace any zero-length strings
df[df==""] <- NA
df
# A B C D ms lower greater
# 1 3 4 6 6 4.75 A,B C,D
# 2 2 8 8 2 5.00 A C
# 3 5 3 2 3 3.25 B <NA>
# 4 0 5 3 3 2.75 A C,D
# 5 4 1 7 6 4.50 A,B C,D
您可以在基础 R 中使用 apply
。
df$lower <- apply(df, 1, function(x) paste(names(which(x[1:2] < x["ms"])), collapse = ", "))
df$greater <- apply(df, 1, function(x) paste(names(which(x[3:4] > x["ms"])), collapse = ", "))
A B C D ms lower greater
1 3 4 6 6 4.75 A, B C, D
2 2 8 8 2 5.00 A C
3 5 3 2 3 3.25 B
4 0 5 3 3 2.75 A C, D
5 4 1 7 6 4.50 A, B C, D
另一种方法也使用 apply
但在开始时测试条件。
df$lower <- apply(df[1:2] < df$ms, 1, \(x) toString(names(which(x))))
df$greater <- apply(df[3:4] > df$ms, 1, \(x) toString(names(which(x))))
df
# A B C D ms lower greater
#1 3 4 6 6 4.75 A, B C, D
#2 2 8 8 2 5.00 A C
#3 5 3 2 3 3.25 B
#4 0 5 3 3 2.75 A C, D
#5 4 1 7 6 4.50 A, B C, D
或重用代码:
cbind(df, lapply(list(lower = df[1:2] < df$ms, greater = df[3:4] > df$ms),
apply, 1, \(x) paste(names(x)[x], collapse=",") ))
# A B C D ms lower greater
#1 3 4 6 6 4.75 A,B C,D
#2 2 8 8 2 5.00 A C
#3 5 3 2 3 3.25 B
#4 0 5 3 3 2.75 A C,D
#5 4 1 7 6 4.50 A,B C,D
我有一个数据框:
set.seed(100)
A <- floor(runif(5, min=0, max=10))
B <- floor(runif(5, min=0, max=10))
C <- floor(runif(5, min=0, max=10))
D <- floor(runif(5, min=0, max=10))
df <- data.frame(A,B,C,D)
df$ms <- rowMeans(df)
df
A B C D ms
1 3 4 6 6 4.75
2 2 8 8 2 5.00
3 5 3 2 3 3.25
4 0 5 3 3 2.75
5 4 1 7 6 4.50
现在,当特定行中的值在 A 列和 B 列中低于平均值并且在 C 列和 D 列中也高于平均值时,我想添加带有列名的列(较低和较高)。期望的结果:
A B C D ms lower greater
1 3 4 6 6 4.75 A,B C,D
2 2 8 8 2 5.00 A C
3 5 3 2 3 3.25 B NA
4 0 5 3 3 2.75 A NA
5 4 1 7 6 4.50 A,B C,D
我试图用 which()
来做这个但是我卡住了,你能给我一个提示吗?
lapply(apply(df,1, function(x) which(df$ms)),names)
在 base R 中,我想你可以这样做:
df$lower <- lapply(df[1:2], \(x) x < df$ms) |>
data.frame() |>
apply(1, \(x) paste(names(x)[x], collapse = ","))
df$greater <- lapply(df[3:4], \(x) x > df$ms) |>
data.frame() |>
apply(1, \(x) paste(names(x)[x], collapse = ","))
# Replace any zero-length strings
df[df==""] <- NA
df
# A B C D ms lower greater
# 1 3 4 6 6 4.75 A,B C,D
# 2 2 8 8 2 5.00 A C
# 3 5 3 2 3 3.25 B <NA>
# 4 0 5 3 3 2.75 A C,D
# 5 4 1 7 6 4.50 A,B C,D
您可以在基础 R 中使用 apply
。
df$lower <- apply(df, 1, function(x) paste(names(which(x[1:2] < x["ms"])), collapse = ", "))
df$greater <- apply(df, 1, function(x) paste(names(which(x[3:4] > x["ms"])), collapse = ", "))
A B C D ms lower greater
1 3 4 6 6 4.75 A, B C, D
2 2 8 8 2 5.00 A C
3 5 3 2 3 3.25 B
4 0 5 3 3 2.75 A C, D
5 4 1 7 6 4.50 A, B C, D
另一种方法也使用 apply
但在开始时测试条件。
df$lower <- apply(df[1:2] < df$ms, 1, \(x) toString(names(which(x))))
df$greater <- apply(df[3:4] > df$ms, 1, \(x) toString(names(which(x))))
df
# A B C D ms lower greater
#1 3 4 6 6 4.75 A, B C, D
#2 2 8 8 2 5.00 A C
#3 5 3 2 3 3.25 B
#4 0 5 3 3 2.75 A C, D
#5 4 1 7 6 4.50 A, B C, D
或重用代码:
cbind(df, lapply(list(lower = df[1:2] < df$ms, greater = df[3:4] > df$ms),
apply, 1, \(x) paste(names(x)[x], collapse=",") ))
# A B C D ms lower greater
#1 3 4 6 6 4.75 A,B C,D
#2 2 8 8 2 5.00 A C
#3 5 3 2 3 3.25 B
#4 0 5 3 3 2.75 A C,D
#5 4 1 7 6 4.50 A,B C,D