如何从同一列中的值创建新变量?
How to create a new variable from values in the same column?
我的目标是创建一个新变量来表示纵向数据中同一列中两个日期之间的差异
animal data new_var
1 15/03/2020 NA
1 18/03/2020 3
1 18/04/2020 30
1 20/04/2020 2
2 13/01/2020 NA
2 18/01/2020 5
2 25/01/2020 7
2 25/03/2020 30
new_var 是同一动物的两个连续日期之间的差异(以天为单位)。该文件之前按动物和日期排序。
我想到了以下解决方案:
animal data data2 new_var
1 15/03/2020 . .
1 18/03/2020 15/03/2020 3
1 18/04/2020 18/03/2020 30
1 20/04/2020 18/04/2020 2
2 13/01/2020 . NA
2 18/01/2020 13/01/2020 5
2 25/01/2020 18/01/2020 7
2 25/03/2020 25/01/2020 60
我尝试使用 diff 函数,但我在尝试此操作时收到错误消息:
df$data2 <- diff(df$data, lag=1)
df$new_var <- df$data - df$data2
我希望我能清楚地传达我的信息。如果没有,我认为一小段示例代码以及我想如何扩展它应该足够清楚了。期待建议。
diff
returns a length
比原始数据列长度少1。我们需要在开头或结尾附加一个值来纠正它。此外,它可能需要按 'animal'
分组
library(dplyr)
library(lubridate)
df %>%
group_by(animal) %>%
mutate(new_var = as.numeric(c(NA, diff(dmy(data))))) %>%
ungroup
-输出
# A tibble: 8 x 3
# animal data new_var
# <int> <chr> <dbl>
#1 1 15/03/2020 NA
#2 1 18/03/2020 3
#3 1 18/04/2020 31
#4 1 20/04/2020 2
#5 2 13/01/2020 NA
#6 2 18/01/2020 5
#7 2 25/01/2020 7
#8 2 25/03/2020 60
数据
df <- structure(list(animal = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), data = c("15/03/2020",
"18/03/2020", "18/04/2020", "20/04/2020", "13/01/2020", "18/01/2020",
"25/01/2020", "25/03/2020")), row.names = c(NA, -8L), class = "data.frame")
方法略有不同。如果您只需要数字,请按照 akrun 的建议使用 as.numeric
!
library(dplyr)
library(lubridate)
df1 <- df %>%
group_by(animal) %>%
mutate(data2 = lag(data)) %>%
mutate(new_var = dmy(data) - dmy(data2)) %>%
mutate(new_var1 = as.numeric(dmy(data) - dmy(data2))) # idea from akrun
> df1
# A tibble: 8 x 5
# Groups: animal [2]
animal data data2 new_var new_var1
<int> <chr> <chr> <drtn> <dbl>
1 1 15/03/2020 NA NA days NA
2 1 18/03/2020 15/03/2020 3 days 3
3 1 18/04/2020 18/03/2020 31 days 31
4 1 20/04/2020 18/04/2020 2 days 2
5 2 13/01/2020 NA NA days NA
6 2 18/01/2020 13/01/2020 5 days 5
7 2 25/01/2020 18/01/2020 7 days 7
8 2 25/03/2020 25/01/2020 60 days 60
我的目标是创建一个新变量来表示纵向数据中同一列中两个日期之间的差异
animal data new_var
1 15/03/2020 NA
1 18/03/2020 3
1 18/04/2020 30
1 20/04/2020 2
2 13/01/2020 NA
2 18/01/2020 5
2 25/01/2020 7
2 25/03/2020 30
new_var 是同一动物的两个连续日期之间的差异(以天为单位)。该文件之前按动物和日期排序。
我想到了以下解决方案:
animal data data2 new_var
1 15/03/2020 . .
1 18/03/2020 15/03/2020 3
1 18/04/2020 18/03/2020 30
1 20/04/2020 18/04/2020 2
2 13/01/2020 . NA
2 18/01/2020 13/01/2020 5
2 25/01/2020 18/01/2020 7
2 25/03/2020 25/01/2020 60
我尝试使用 diff 函数,但我在尝试此操作时收到错误消息:
df$data2 <- diff(df$data, lag=1) df$new_var <- df$data - df$data2
我希望我能清楚地传达我的信息。如果没有,我认为一小段示例代码以及我想如何扩展它应该足够清楚了。期待建议。
diff
returns a length
比原始数据列长度少1。我们需要在开头或结尾附加一个值来纠正它。此外,它可能需要按 'animal'
library(dplyr)
library(lubridate)
df %>%
group_by(animal) %>%
mutate(new_var = as.numeric(c(NA, diff(dmy(data))))) %>%
ungroup
-输出
# A tibble: 8 x 3
# animal data new_var
# <int> <chr> <dbl>
#1 1 15/03/2020 NA
#2 1 18/03/2020 3
#3 1 18/04/2020 31
#4 1 20/04/2020 2
#5 2 13/01/2020 NA
#6 2 18/01/2020 5
#7 2 25/01/2020 7
#8 2 25/03/2020 60
数据
df <- structure(list(animal = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), data = c("15/03/2020",
"18/03/2020", "18/04/2020", "20/04/2020", "13/01/2020", "18/01/2020",
"25/01/2020", "25/03/2020")), row.names = c(NA, -8L), class = "data.frame")
方法略有不同。如果您只需要数字,请按照 akrun 的建议使用 as.numeric
!
library(dplyr)
library(lubridate)
df1 <- df %>%
group_by(animal) %>%
mutate(data2 = lag(data)) %>%
mutate(new_var = dmy(data) - dmy(data2)) %>%
mutate(new_var1 = as.numeric(dmy(data) - dmy(data2))) # idea from akrun
> df1
# A tibble: 8 x 5
# Groups: animal [2]
animal data data2 new_var new_var1
<int> <chr> <chr> <drtn> <dbl>
1 1 15/03/2020 NA NA days NA
2 1 18/03/2020 15/03/2020 3 days 3
3 1 18/04/2020 18/03/2020 31 days 31
4 1 20/04/2020 18/04/2020 2 days 2
5 2 13/01/2020 NA NA days NA
6 2 18/01/2020 13/01/2020 5 days 5
7 2 25/01/2020 18/01/2020 7 days 7
8 2 25/03/2020 25/01/2020 60 days 60