如何通过将每个项目与其相对的第三项切片来创建百分比列?
How to create a percentage column by slicing every item with its relative third item?
我有一个包含两个变量的数据框。
df <- data.frame(weight = c(30,30,109,30,309,10,20,20,14),
nutrition = c("Yes", "Yes", "No", "Yes", "Yes","No"))
我想创建一个额外的列,通过将它除以营养为否时出现的值来计算体重的百分比变化。预期输出如下
# expected output
change_of_weight = c(30/109, 30/109, 109/109, 30/10,309/10,10/10,20/14,20/14,14/14)
您可以创建一个组列,在 nutrition = 'No'
时创建一个新组,并将 weight
除以 last
值。
library(dplyr)
df %>%
group_by(group = lag(cumsum(nutrition == 'No'), default = 0)) %>%
mutate(new_weight = weight/last(weight)) %>%
#You can also use
#mutate(new_weight = weight/weight[nutrition =='No']) %>%
ungroup() %>% dplyr::select(-group)
# A tibble: 9 x 3
# weight nutrition new_weight
# <dbl> <chr> <dbl>
#1 30 Yes 0.275
#2 30 Yes 0.275
#3 109 No 1
#4 30 Yes 3
#5 309 Yes 30.9
#6 10 No 1
#7 20 Yes 1.43
#8 20 Yes 1.43
#9 14 No 1
我们可以使用data.table
方法。将data.frame转换为'data.table'(setDT
),按逻辑向量累加和lag
分组,将'weight'除以last
'weight' 的值并将其分配 (:=
) 到新列
library(data.table)
setDT(df)[, new_weight := weight/last(weight),
.(shift(cumsum(nutrition == "No"), fill = 0))]
df
# weight nutrition new_weight
#1: 30 Yes 0.2752294
#2: 30 Yes 0.2752294
#3: 109 No 1.0000000
#4: 30 Yes 3.0000000
#5: 309 Yes 30.9000000
#6: 10 No 1.0000000
#7: 20 Yes 1.4285714
#8: 20 Yes 1.4285714
#9: 14 No 1.0000000
如果我们不想更新原始数据对象并且只想将一列作为输出
setDT(df)[, weight/last(weight), .(shift(cumsum(nutrition == "No"), fill = 0))][, .(weight = V1)]
# weight
#1: 0.2752294
#2: 0.2752294
#3: 1.0000000
#4: 3.0000000
#5: 30.9000000
#6: 1.0000000
#7: 1.4285714
#8: 1.4285714
#9: 1.0000000
我有一个包含两个变量的数据框。
df <- data.frame(weight = c(30,30,109,30,309,10,20,20,14),
nutrition = c("Yes", "Yes", "No", "Yes", "Yes","No"))
我想创建一个额外的列,通过将它除以营养为否时出现的值来计算体重的百分比变化。预期输出如下
# expected output
change_of_weight = c(30/109, 30/109, 109/109, 30/10,309/10,10/10,20/14,20/14,14/14)
您可以创建一个组列,在 nutrition = 'No'
时创建一个新组,并将 weight
除以 last
值。
library(dplyr)
df %>%
group_by(group = lag(cumsum(nutrition == 'No'), default = 0)) %>%
mutate(new_weight = weight/last(weight)) %>%
#You can also use
#mutate(new_weight = weight/weight[nutrition =='No']) %>%
ungroup() %>% dplyr::select(-group)
# A tibble: 9 x 3
# weight nutrition new_weight
# <dbl> <chr> <dbl>
#1 30 Yes 0.275
#2 30 Yes 0.275
#3 109 No 1
#4 30 Yes 3
#5 309 Yes 30.9
#6 10 No 1
#7 20 Yes 1.43
#8 20 Yes 1.43
#9 14 No 1
我们可以使用data.table
方法。将data.frame转换为'data.table'(setDT
),按逻辑向量累加和lag
分组,将'weight'除以last
'weight' 的值并将其分配 (:=
) 到新列
library(data.table)
setDT(df)[, new_weight := weight/last(weight),
.(shift(cumsum(nutrition == "No"), fill = 0))]
df
# weight nutrition new_weight
#1: 30 Yes 0.2752294
#2: 30 Yes 0.2752294
#3: 109 No 1.0000000
#4: 30 Yes 3.0000000
#5: 309 Yes 30.9000000
#6: 10 No 1.0000000
#7: 20 Yes 1.4285714
#8: 20 Yes 1.4285714
#9: 14 No 1.0000000
如果我们不想更新原始数据对象并且只想将一列作为输出
setDT(df)[, weight/last(weight), .(shift(cumsum(nutrition == "No"), fill = 0))][, .(weight = V1)]
# weight
#1: 0.2752294
#2: 0.2752294
#3: 1.0000000
#4: 3.0000000
#5: 30.9000000
#6: 1.0000000
#7: 1.4285714
#8: 1.4285714
#9: 1.0000000