如何从同一列中的值创建新变量?

How to create a new variable from values ​in the same column?

我的目标是创建一个新变量来表示纵向数据中同一列中两个日期之间的差异

animal   data        new_var
1      15/03/2020      NA
1      18/03/2020      3
1      18/04/2020      30     
1      20/04/2020      2
2      13/01/2020      NA
2      18/01/2020      5
2      25/01/2020      7
2      25/03/2020      30

new_var 是同一动物的两个连续日期之间的差异(以天为单位)。该文件之前按动物和日期排序。

我想到了以下解决方案:

 animal   data           data2           new_var
    1      15/03/2020      .                 .
    1      18/03/2020    15/03/2020          3
    1      18/04/2020    18/03/2020         30     
    1      20/04/2020    18/04/2020          2
    2      13/01/2020       .               NA
    2      18/01/2020    13/01/2020          5
    2      25/01/2020    18/01/2020          7
    2      25/03/2020    25/01/2020         60

我尝试使用 diff 函数,但我在尝试此操作时收到错误消息:

df$data2 <- diff(df$data, lag=1) df$new_var <- df$data - df$data2

我希望我能清楚地传达我的信息。如果没有,我认为一小段示例代码以及我想如何扩展它应该足够清楚了。期待建议。

diff returns a length 比原始数据列长度少1。我们需要在开头或结尾附加一个值来纠正它。此外,它可能需要按 'animal'

分组
library(dplyr)
library(lubridate)
df %>% 
   group_by(animal) %>%
   mutate(new_var = as.numeric(c(NA, diff(dmy(data))))) %>%
   ungroup

-输出

# A tibble: 8 x 3
#  animal data       new_var
#   <int> <chr>        <dbl>
#1      1 15/03/2020      NA
#2      1 18/03/2020       3
#3      1 18/04/2020      31
#4      1 20/04/2020       2
#5      2 13/01/2020      NA
#6      2 18/01/2020       5
#7      2 25/01/2020       7
#8      2 25/03/2020      60

数据

df <- structure(list(animal = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), data = c("15/03/2020", 
"18/03/2020", "18/04/2020", "20/04/2020", "13/01/2020", "18/01/2020", 
"25/01/2020", "25/03/2020")), row.names = c(NA, -8L), class = "data.frame")

方法略有不同。如果您只需要数字,请按照 akrun 的建议使用 as.numeric

library(dplyr)
library(lubridate)
df1 <- df %>% 
  group_by(animal) %>% 
  mutate(data2 = lag(data)) %>% 
  mutate(new_var = dmy(data) - dmy(data2)) %>%
  mutate(new_var1 = as.numeric(dmy(data) - dmy(data2))) # idea from akrun


> df1
# A tibble: 8 x 5
# Groups:   animal [2]
  animal data       data2      new_var new_var1
   <int> <chr>      <chr>      <drtn>     <dbl>
1      1 15/03/2020 NA         NA days       NA
2      1 18/03/2020 15/03/2020  3 days        3
3      1 18/04/2020 18/03/2020 31 days       31
4      1 20/04/2020 18/04/2020  2 days        2
5      2 13/01/2020 NA         NA days       NA
6      2 18/01/2020 13/01/2020  5 days        5
7      2 25/01/2020 18/01/2020  7 days        7
8      2 25/03/2020 25/01/2020 60 days       60