在数据框中创建一列,指示其他列中的值是否连续
Creating a column in data frame that indicate if value in other column is consecutive
我有一个如下所示的数据集:
Value <- c(1, 3, 4, 5, 0, 210,
2, 0.5, 7, 0, 201, 300,
3, 0, 500, 6, 2, 1,
8, 0, 200, 137, 0.76, 2.3)
Ingredient <- as.factor(c("A", "B", "C", "D", "E", "E",
"E" ,"F", "G", "H", "H", "H",
"A", "B", "B", "C", "D", "E",
"E", "F", "F", "F", "G", "H"))
Condition <- as.factor(rep(c(rep(1,6), rep(2, 6)),2))
df <- data.frame(Condition, Ingredient, Value)
我想创建一个列来指示 Ingredient
列中的值在某个条件下是连续的,因此从这里开始:
> df
Condition Ingredient Value
1 1 A 1.00
2 1 B 3.00
3 1 C 4.00
4 1 D 5.00
5 1 E 0.00
6 1 E 210.00
7 2 E 2.00
8 2 F 0.50
9 2 G 7.00
10 2 H 0.00
11 2 H 201.00
12 2 H 300.00
13 1 A 3.00
14 1 B 0.00
15 1 B 500.00
16 1 C 6.00
17 1 D 2.00
18 1 E 1.00
19 2 E 8.00
20 2 F 0.00
21 2 F 200.00
22 2 F 137.00
23 2 G 0.76
24 2 H 2.30
我可以得到这个输出:
Condition Ingredient Value Consecutive
1 1 A 1.00 FALSE
2 1 B 3.00 FALSE
3 1 C 4.00 FALSE
4 1 D 5.00 FALSE
5 1 E 0.00 FALSE
6 1 E 210.00 TRUE
7 2 E 2.00 FALSE
8 2 F 0.50 FALSE
9 2 G 7.00 FALSE
10 2 H 0.00 FALSE
11 2 H 201.00 TRUE
12 2 H 300.00 TRUE
13 1 A 3.00 FALSE
14 1 B 0.00 FALSE
15 1 B 500.00 TRUE
16 1 C 6.00 FALSE
17 1 D 2.00 FALSE
18 1 E 1.00 FALSE
19 2 E 8.00 FALSE
20 2 F 0.00 FALSE
21 2 F 200.00 TRUE
22 2 F 137.00 TRUE
23 2 G 0.76 FALSE
24 2 H 2.30 FALSE
请注意第 6 行到第 7 行的过渡:有两个连续的字母 (E),但是第 7 行应该是 FALSE,因为这个连续的“E”没有出现在相同的条件下。
感谢您的帮助!
是这样的吗?
df %>%
group_by(Condition) %>%
mutate(Consecutive = case_when(Ingredient == dplyr::lag(Ingredient) ~ TRUE,
TRUE ~ FALSE)) %>%
ungroup()
使用data.table
:
library(data.table)
setDT(df)[,Consecutive := Ingredient == shift(Ingredient,fill = last(Ingredient)), Condition]
df
# Condition Ingredient Value Consecutive
# 1: 1 A 1.00 FALSE
# 2: 1 B 3.00 FALSE
# 3: 1 C 4.00 FALSE
# 4: 1 D 5.00 FALSE
# 5: 1 E 0.00 FALSE
# 6: 1 E 210.00 TRUE
# 7: 2 E 2.00 FALSE
# 8: 2 F 0.50 FALSE
# 9: 2 G 7.00 FALSE
#10: 2 H 0.00 FALSE
#11: 2 H 201.00 TRUE
#12: 2 H 300.00 TRUE
#13: 1 A 3.00 FALSE
#14: 1 B 0.00 FALSE
#15: 1 B 500.00 TRUE
#16: 1 C 6.00 FALSE
#17: 1 D 2.00 FALSE
#18: 1 E 1.00 FALSE
#19: 2 E 8.00 FALSE
#20: 2 F 0.00 FALSE
#21: 2 F 200.00 TRUE
#22: 2 F 137.00 TRUE
#23: 2 G 0.76 FALSE
#24: 2 H 2.30 FALSE
# Condition Ingredient Value Consecutive
我有一个如下所示的数据集:
Value <- c(1, 3, 4, 5, 0, 210,
2, 0.5, 7, 0, 201, 300,
3, 0, 500, 6, 2, 1,
8, 0, 200, 137, 0.76, 2.3)
Ingredient <- as.factor(c("A", "B", "C", "D", "E", "E",
"E" ,"F", "G", "H", "H", "H",
"A", "B", "B", "C", "D", "E",
"E", "F", "F", "F", "G", "H"))
Condition <- as.factor(rep(c(rep(1,6), rep(2, 6)),2))
df <- data.frame(Condition, Ingredient, Value)
我想创建一个列来指示 Ingredient
列中的值在某个条件下是连续的,因此从这里开始:
> df
Condition Ingredient Value
1 1 A 1.00
2 1 B 3.00
3 1 C 4.00
4 1 D 5.00
5 1 E 0.00
6 1 E 210.00
7 2 E 2.00
8 2 F 0.50
9 2 G 7.00
10 2 H 0.00
11 2 H 201.00
12 2 H 300.00
13 1 A 3.00
14 1 B 0.00
15 1 B 500.00
16 1 C 6.00
17 1 D 2.00
18 1 E 1.00
19 2 E 8.00
20 2 F 0.00
21 2 F 200.00
22 2 F 137.00
23 2 G 0.76
24 2 H 2.30
我可以得到这个输出:
Condition Ingredient Value Consecutive
1 1 A 1.00 FALSE
2 1 B 3.00 FALSE
3 1 C 4.00 FALSE
4 1 D 5.00 FALSE
5 1 E 0.00 FALSE
6 1 E 210.00 TRUE
7 2 E 2.00 FALSE
8 2 F 0.50 FALSE
9 2 G 7.00 FALSE
10 2 H 0.00 FALSE
11 2 H 201.00 TRUE
12 2 H 300.00 TRUE
13 1 A 3.00 FALSE
14 1 B 0.00 FALSE
15 1 B 500.00 TRUE
16 1 C 6.00 FALSE
17 1 D 2.00 FALSE
18 1 E 1.00 FALSE
19 2 E 8.00 FALSE
20 2 F 0.00 FALSE
21 2 F 200.00 TRUE
22 2 F 137.00 TRUE
23 2 G 0.76 FALSE
24 2 H 2.30 FALSE
请注意第 6 行到第 7 行的过渡:有两个连续的字母 (E),但是第 7 行应该是 FALSE,因为这个连续的“E”没有出现在相同的条件下。
感谢您的帮助!
是这样的吗?
df %>%
group_by(Condition) %>%
mutate(Consecutive = case_when(Ingredient == dplyr::lag(Ingredient) ~ TRUE,
TRUE ~ FALSE)) %>%
ungroup()
使用data.table
:
library(data.table)
setDT(df)[,Consecutive := Ingredient == shift(Ingredient,fill = last(Ingredient)), Condition]
df
# Condition Ingredient Value Consecutive
# 1: 1 A 1.00 FALSE
# 2: 1 B 3.00 FALSE
# 3: 1 C 4.00 FALSE
# 4: 1 D 5.00 FALSE
# 5: 1 E 0.00 FALSE
# 6: 1 E 210.00 TRUE
# 7: 2 E 2.00 FALSE
# 8: 2 F 0.50 FALSE
# 9: 2 G 7.00 FALSE
#10: 2 H 0.00 FALSE
#11: 2 H 201.00 TRUE
#12: 2 H 300.00 TRUE
#13: 1 A 3.00 FALSE
#14: 1 B 0.00 FALSE
#15: 1 B 500.00 TRUE
#16: 1 C 6.00 FALSE
#17: 1 D 2.00 FALSE
#18: 1 E 1.00 FALSE
#19: 2 E 8.00 FALSE
#20: 2 F 0.00 FALSE
#21: 2 F 200.00 TRUE
#22: 2 F 137.00 TRUE
#23: 2 G 0.76 FALSE
#24: 2 H 2.30 FALSE
# Condition Ingredient Value Consecutive