如何通过将每个项目与其相对的第三项切片来创建百分比列?

How to create a percentage column by slicing every item with its relative third item?

我有一个包含两个变量的数据框。

 df <- data.frame(weight = c(30,30,109,30,309,10,20,20,14),
                     nutrition = c("Yes", "Yes", "No", "Yes", "Yes","No"))

我想创建一个额外的列,通过将它除以营养为否时出现的值来计算体重的百分比变化。预期输出如下

# expected output
change_of_weight = c(30/109, 30/109, 109/109, 30/10,309/10,10/10,20/14,20/14,14/14)

您可以创建一个组列,在 nutrition = 'No' 时创建一个新组,并将 weight 除以 last 值。

library(dplyr)

df %>%
  group_by(group = lag(cumsum(nutrition == 'No'), default = 0)) %>%
  mutate(new_weight = weight/last(weight)) %>%
  #You can also use
  #mutate(new_weight = weight/weight[nutrition =='No']) %>%
  ungroup() %>% dplyr::select(-group)

# A tibble: 9 x 3
#  weight nutrition new_weight
#   <dbl> <chr>          <dbl>
#1     30 Yes            0.275
#2     30 Yes            0.275
#3    109 No             1    
#4     30 Yes            3    
#5    309 Yes           30.9  
#6     10 No             1    
#7     20 Yes            1.43 
#8     20 Yes            1.43 
#9     14 No             1    

我们可以使用data.table方法。将data.frame转换为'data.table'(setDT),按逻辑向量累加和lag分组,将'weight'除以last 'weight' 的值并将其分配 (:=) 到新列

library(data.table)
setDT(df)[, new_weight := weight/last(weight), 
           .(shift(cumsum(nutrition == "No"), fill = 0))]
df
#   weight nutrition new_weight
#1:     30       Yes  0.2752294
#2:     30       Yes  0.2752294
#3:    109        No  1.0000000
#4:     30       Yes  3.0000000
#5:    309       Yes 30.9000000
#6:     10        No  1.0000000
#7:     20       Yes  1.4285714
#8:     20       Yes  1.4285714
#9:     14        No  1.0000000

如果我们不想更新原始数据对象并且只想将一列作为输出

setDT(df)[, weight/last(weight), .(shift(cumsum(nutrition == "No"), fill = 0))][, .(weight = V1)]
#       weight
#1:  0.2752294
#2:  0.2752294
#3:  1.0000000
#4:  3.0000000
#5: 30.9000000
#6:  1.0000000
#7:  1.4285714
#8:  1.4285714
#9:  1.0000000