group_by 滞后 returns 不滞后变量
group_by lag returns unlagged variable
试图创建一个滞后变量以计算两次观察之间行驶的英里数时,我完全感到困惑和困惑。
> fuel_sheet %>% dplyr::select(vehicle, date, mileage)
+
# A tibble: 64 x 3
# Groups: vehicle [2]
vehicle date mileage
<fct> <date> <dbl>
1 VW T4 Campervan 2015-09-30 150631
2 VW T4 Campervan 2015-10-16 150866
3 VW T4 Campervan 2015-10-18 151395
4 VW T4 Campervan 2015-12-06 151731
5 VW T4 Campervan 2016-01-22 151968
6 VW T4 Campervan 2016-01-24 152360
7 VW T4 Campervan 2016-01-25 152731
8 VW T4 Campervan 2016-02-21 153066
9 VW T4 Campervan 2016-03-29 153500
10 VW T4 Campervan 2016-04-16 153751
# … with 54 more rows
> fuel_sheet <- fuel_sheet %>%
group_by(vehicle) %>%
mutate(miles_lag = lag(mileage, n=1, order_by=date)) %>%
mutate(miles = mileage - lag(mileage, n=1, order_by=date))
> fuel_sheet %>% dplyr::select(vehicle, date, mileage, miles_lag, miles)
# A tibble: 64 x 5
# Groups: vehicle [2]
vehicle date mileage miles_lag miles
<fct> <date> <dbl> <dbl> <dbl>
1 VW T4 Campervan 2015-09-30 150631 150631 0
2 VW T4 Campervan 2015-10-16 150866 150866 0
3 VW T4 Campervan 2015-10-18 151395 151395 0
4 VW T4 Campervan 2015-12-06 151731 151731 0
5 VW T4 Campervan 2016-01-22 151968 151968 0
6 VW T4 Campervan 2016-01-24 152360 152360 0
7 VW T4 Campervan 2016-01-25 152731 152731 0
8 VW T4 Campervan 2016-02-21 153066 153066 0
9 VW T4 Campervan 2016-03-29 153500 153500 0
10 VW T4 Campervan 2016-04-16 153751 153751 0
# … with 54 more rows
如果我尝试使用 diff()
代替滞后来直接计算差值,我会得到一个少了一个元素的数组。
在 SO 上发现了很多关于此类事情的话题,一个例子是 ,但看不出为什么我的代码没有按预期运行。
与 plyr
没有冲突,因为它没有加载,我看不到任何其他会影响这里的冲突。
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Gentoo/Linux
Matrix products: default
BLAS: /usr/lib64/libblas.so.3.8.0
LAPACK: /usr/lib64/liblapack.so.3.8.0
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8 LC_PAPER=en_GB.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] graphics grDevices utils datasets stats methods base
other attached packages:
[1] reprex_0.3.0 xtable_1.8-4 orgutils_0.4-1 lubridate_1.7.9 googlesheets4_0.2.0
[6] chron_2.3-56 ascii_2.3 spelling_2.1 RSQLite_2.2.0 rmsfact_0.0.3
[11] rmarkdown_2.3 reticulate_1.16 magrittr_1.5 kableExtra_1.2.1 knitr_1.29
[16] Hmisc_4.4-1 Formula_1.2-3 survival_3.2-3 lattice_0.20-41 googlesheets_0.3.0
[21] foreign_0.8-80 fcuk_0.1.21 devtools_2.3.1 usethis_1.6.1 forcats_0.5.0
[26] stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4 readr_1.3.1 tidyr_1.1.2
[31] tibble_3.0.3 ggplot2_3.3.2 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] googledrive_1.0.1 colorspace_1.4-1 ellipsis_0.3.1 rprojroot_1.3-2 htmlTable_2.0.1
[6] base64enc_0.1-3 fs_1.5.0 rstudioapi_0.11 remotes_2.2.0 bit64_4.0.5
[11] fansi_0.4.1 xml2_1.3.2 splines_4.0.2 pkgload_1.1.0 jsonlite_1.7.1
[16] broom_0.7.0 cluster_2.1.0 dbplyr_1.4.4 png_0.1-7 clipr_0.7.0
[21] compiler_4.0.2 httr_1.4.2 backports_1.1.9 assertthat_0.2.1 Matrix_1.2-18
[26] gargle_0.5.0 cli_2.0.2 later_1.1.0.1 htmltools_0.5.0 prettyunits_1.1.1
[31] tools_4.0.2 gtable_0.3.0 glue_1.4.2 Rcpp_1.0.5 cellranger_1.1.0
[36] vctrs_0.3.4 xfun_0.17 ps_1.3.4 testthat_2.3.2 rvest_0.3.6
[41] lifecycle_0.2.0 textutils_0.2-0 stringdist_0.9.6 scales_1.1.1 promises_1.1.1
[46] hms_0.5.3 parallel_4.0.2 RColorBrewer_1.1-2 curl_4.3 memoise_1.1.0
[51] gridExtra_2.3 rpart_4.1-15 latticeExtra_0.6-29 stringi_1.5.3 desc_1.2.0
[56] checkmate_2.0.0 pkgbuild_1.1.0 rlang_0.4.7 pkgconfig_2.0.3 evaluate_0.14
[61] htmlwidgets_1.5.1 bit_4.0.4 processx_3.4.4 tidyselect_1.1.0 R6_2.4.1
[66] generics_0.0.2 DBI_1.1.0 whisker_0.4 pillar_1.4.6 haven_2.3.1
[71] withr_2.2.0 nnet_7.3-14 modelr_0.1.8 crayon_1.3.4 utf8_1.1.4
[76] jpeg_0.1-8.1 grid_4.0.2 readxl_1.3.1 data.table_1.13.0 blob_1.2.1
[81] callr_3.4.4 digest_0.6.25 webshot_0.5.2 httpuv_1.5.4 openssl_1.4.2
[86] munsell_0.5.0 viridisLite_0.3.0 askpass_1.1 sessioninfo_1.1.1
> tidyverse_conflicts()
── Conflicts ───────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ lubridate::as.difftime() masks base::as.difftime()
✖ lubridate::date() masks base::date()
✖ lubridate::days() masks chron::days()
✖ ascii::expand() masks tidyr::expand()
✖ magrittr::extract() masks tidyr::extract()
✖ stats::filter() masks dplyr::filter()
✖ kableExtra::group_rows() masks dplyr::group_rows()
✖ lubridate::hours() masks chron::hours()
✖ lubridate::intersect() masks base::intersect()
✖ stats::lag() masks dplyr::lag()
✖ lubridate::minutes() masks chron::minutes()
✖ lubridate::seconds() masks chron::seconds()
✖ magrittr::set_names() masks purrr::set_names()
✖ lubridate::setdiff() masks base::setdiff()
✖ Hmisc::src() masks dplyr::src()
✖ Hmisc::summarize() masks dplyr::summarize()
✖ lubridate::union() masks base::union()
✖ lubridate::years() masks chron::years()
如有任何建议或指点,我们将不胜感激,如果我需要提供更多信息,请告诉我。
提前致谢。
编辑:根据要求 dput()
的数据,我知道有一种简单的方法可以将数据转储出来供其他人使用,但我一辈子都记不起它是什么,曾经查看 reprex
包以帮助但没有成功...
> dput(fuel_sheet %>% dplyr::select(vehicle, date, mileage))
structure(list(vehicle = structure(c(3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L,
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Ford Focus",
"Honda Jazz", "VW T4 Campervan"), class = "factor"), date = structure(c(16708,
16724, 16726, 16775, 16822, 16824, 16825, 16852, 16889, 16907,
16914, 16915, 16915, 16973, 17009, 17011, 17015, 17025, 17047,
17092, 17117, 17117, 17144, 17146, 17173, 17237, 17284, 17295,
17319, 17333, 17361, 17423, 17434, 17453, 17466, 17558, 17579,
17634, 17635, 17641, 17642, 17710, 17743, 17807, 17807, 17929,
17995, 17999, 18004, 18042, 18074, 18077, 18106, 18129, 18139,
18223, 18227, 18337, 18469, 18470, 18493, 18501, 18505, 18509
), class = "Date"), mileage = c(150631, 150866, 151395, 151731,
151968, 152360, 152731, 153066, 153500, 153751, 154363, 154619,
154866, 155473, 156008, 156199, 156637, 156997, 157157, 61476,
61774, 157458, 62175, 62409, 63209, 157769, 158325, 158803, 158956,
159314, 159706, 160126, 160550, 160986, 161363, 162000, 162449,
162879, 163215, 163636, 164005, 164613, 164802, 165352, 165651,
165946, 166488, 166987, 167131, 167749, 168013, 168361, 168853,
169371, 169996, 170636, 170640, 170954, 171573, 171763, 172286,
172616, 173105, 173266)), row.names = c(NA, -64L), groups = structure(list(
vehicle = structure(c(1L, 3L), .Label = c("Ford Focus", "Honda Jazz",
"VW T4 Campervan"), class = "factor"), .rows = structure(list(
c(20L, 21L, 23L, 24L, 25L), c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 22L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L,
35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L,
46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L,
57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = 1:2, class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
编辑 2:省略 order_by
没有帮助...
> fuel_sheet %>% group_by(vehicle) %>% mutate(miles = mileage - lag(mileage, n=1))
# A tibble: 64 x 10
# Groups: vehicle [2]
vehicle mileage litre gbp station date month year miles_lag miles
<fct> <dbl> <dbl> <dbl> <fct> <date> <dbl> <dbl> <dbl> <dbl>
1 VW T4 Campervan 150631 78.0 81.9 Tescos 2015-09-30 9 2015 150631 0
2 VW T4 Campervan 150866 33.6 36.9 Nisa 2015-10-16 10 2015 150866 0
3 VW T4 Campervan 151395 66 72.5 Tescos 2015-10-18 10 2015 151395 0
4 VW T4 Campervan 151731 44 44.4 Morrisons 2015-12-06 12 2015 151731 0
5 VW T4 Campervan 151968 34.1 34 Other 2016-01-22 1 2016 151968 0
6 VW T4 Campervan 152360 49.1 48.1 Morrisons 2016-01-24 1 2016 152360 0
7 VW T4 Campervan 152731 53.2 52.0 Tescos 2016-01-25 1 2016 152731 0
8 VW T4 Campervan 153066 44.0 43 Tescos 2016-02-21 2 2016 153066 0
9 VW T4 Campervan 153500 56.9 58 Tescos 2016-03-29 3 2016 153500 0
10 VW T4 Campervan 153751 27.6 29 Tescos 2016-04-16 4 2016 153751 0
# … with 54 more rows
这可能是一个非常幼稚的解决方案,但它确实有效
df[,'mileage_lag']=c(0,diff(df$mileage))
编辑
library(data.table)
dt=data.table(df)
dt[,'mileage_lag':=c(0,diff(mileage)),by=.(vehicles)]
在你的冲突中
stats::lag() masks dplyr::lag()
如果您在两个 mutate
中指定 dplyr::lag()
,它应该可以工作。 (我检查了指定 stats::lag()
我得到的输出与你的相同。)
试图创建一个滞后变量以计算两次观察之间行驶的英里数时,我完全感到困惑和困惑。
> fuel_sheet %>% dplyr::select(vehicle, date, mileage)
+
# A tibble: 64 x 3
# Groups: vehicle [2]
vehicle date mileage
<fct> <date> <dbl>
1 VW T4 Campervan 2015-09-30 150631
2 VW T4 Campervan 2015-10-16 150866
3 VW T4 Campervan 2015-10-18 151395
4 VW T4 Campervan 2015-12-06 151731
5 VW T4 Campervan 2016-01-22 151968
6 VW T4 Campervan 2016-01-24 152360
7 VW T4 Campervan 2016-01-25 152731
8 VW T4 Campervan 2016-02-21 153066
9 VW T4 Campervan 2016-03-29 153500
10 VW T4 Campervan 2016-04-16 153751
# … with 54 more rows
> fuel_sheet <- fuel_sheet %>%
group_by(vehicle) %>%
mutate(miles_lag = lag(mileage, n=1, order_by=date)) %>%
mutate(miles = mileage - lag(mileage, n=1, order_by=date))
> fuel_sheet %>% dplyr::select(vehicle, date, mileage, miles_lag, miles)
# A tibble: 64 x 5
# Groups: vehicle [2]
vehicle date mileage miles_lag miles
<fct> <date> <dbl> <dbl> <dbl>
1 VW T4 Campervan 2015-09-30 150631 150631 0
2 VW T4 Campervan 2015-10-16 150866 150866 0
3 VW T4 Campervan 2015-10-18 151395 151395 0
4 VW T4 Campervan 2015-12-06 151731 151731 0
5 VW T4 Campervan 2016-01-22 151968 151968 0
6 VW T4 Campervan 2016-01-24 152360 152360 0
7 VW T4 Campervan 2016-01-25 152731 152731 0
8 VW T4 Campervan 2016-02-21 153066 153066 0
9 VW T4 Campervan 2016-03-29 153500 153500 0
10 VW T4 Campervan 2016-04-16 153751 153751 0
# … with 54 more rows
如果我尝试使用 diff()
代替滞后来直接计算差值,我会得到一个少了一个元素的数组。
在 SO 上发现了很多关于此类事情的话题,一个例子是
与 plyr
没有冲突,因为它没有加载,我看不到任何其他会影响这里的冲突。
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Gentoo/Linux
Matrix products: default
BLAS: /usr/lib64/libblas.so.3.8.0
LAPACK: /usr/lib64/liblapack.so.3.8.0
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8 LC_PAPER=en_GB.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] graphics grDevices utils datasets stats methods base
other attached packages:
[1] reprex_0.3.0 xtable_1.8-4 orgutils_0.4-1 lubridate_1.7.9 googlesheets4_0.2.0
[6] chron_2.3-56 ascii_2.3 spelling_2.1 RSQLite_2.2.0 rmsfact_0.0.3
[11] rmarkdown_2.3 reticulate_1.16 magrittr_1.5 kableExtra_1.2.1 knitr_1.29
[16] Hmisc_4.4-1 Formula_1.2-3 survival_3.2-3 lattice_0.20-41 googlesheets_0.3.0
[21] foreign_0.8-80 fcuk_0.1.21 devtools_2.3.1 usethis_1.6.1 forcats_0.5.0
[26] stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4 readr_1.3.1 tidyr_1.1.2
[31] tibble_3.0.3 ggplot2_3.3.2 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] googledrive_1.0.1 colorspace_1.4-1 ellipsis_0.3.1 rprojroot_1.3-2 htmlTable_2.0.1
[6] base64enc_0.1-3 fs_1.5.0 rstudioapi_0.11 remotes_2.2.0 bit64_4.0.5
[11] fansi_0.4.1 xml2_1.3.2 splines_4.0.2 pkgload_1.1.0 jsonlite_1.7.1
[16] broom_0.7.0 cluster_2.1.0 dbplyr_1.4.4 png_0.1-7 clipr_0.7.0
[21] compiler_4.0.2 httr_1.4.2 backports_1.1.9 assertthat_0.2.1 Matrix_1.2-18
[26] gargle_0.5.0 cli_2.0.2 later_1.1.0.1 htmltools_0.5.0 prettyunits_1.1.1
[31] tools_4.0.2 gtable_0.3.0 glue_1.4.2 Rcpp_1.0.5 cellranger_1.1.0
[36] vctrs_0.3.4 xfun_0.17 ps_1.3.4 testthat_2.3.2 rvest_0.3.6
[41] lifecycle_0.2.0 textutils_0.2-0 stringdist_0.9.6 scales_1.1.1 promises_1.1.1
[46] hms_0.5.3 parallel_4.0.2 RColorBrewer_1.1-2 curl_4.3 memoise_1.1.0
[51] gridExtra_2.3 rpart_4.1-15 latticeExtra_0.6-29 stringi_1.5.3 desc_1.2.0
[56] checkmate_2.0.0 pkgbuild_1.1.0 rlang_0.4.7 pkgconfig_2.0.3 evaluate_0.14
[61] htmlwidgets_1.5.1 bit_4.0.4 processx_3.4.4 tidyselect_1.1.0 R6_2.4.1
[66] generics_0.0.2 DBI_1.1.0 whisker_0.4 pillar_1.4.6 haven_2.3.1
[71] withr_2.2.0 nnet_7.3-14 modelr_0.1.8 crayon_1.3.4 utf8_1.1.4
[76] jpeg_0.1-8.1 grid_4.0.2 readxl_1.3.1 data.table_1.13.0 blob_1.2.1
[81] callr_3.4.4 digest_0.6.25 webshot_0.5.2 httpuv_1.5.4 openssl_1.4.2
[86] munsell_0.5.0 viridisLite_0.3.0 askpass_1.1 sessioninfo_1.1.1
> tidyverse_conflicts()
── Conflicts ───────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ lubridate::as.difftime() masks base::as.difftime()
✖ lubridate::date() masks base::date()
✖ lubridate::days() masks chron::days()
✖ ascii::expand() masks tidyr::expand()
✖ magrittr::extract() masks tidyr::extract()
✖ stats::filter() masks dplyr::filter()
✖ kableExtra::group_rows() masks dplyr::group_rows()
✖ lubridate::hours() masks chron::hours()
✖ lubridate::intersect() masks base::intersect()
✖ stats::lag() masks dplyr::lag()
✖ lubridate::minutes() masks chron::minutes()
✖ lubridate::seconds() masks chron::seconds()
✖ magrittr::set_names() masks purrr::set_names()
✖ lubridate::setdiff() masks base::setdiff()
✖ Hmisc::src() masks dplyr::src()
✖ Hmisc::summarize() masks dplyr::summarize()
✖ lubridate::union() masks base::union()
✖ lubridate::years() masks chron::years()
如有任何建议或指点,我们将不胜感激,如果我需要提供更多信息,请告诉我。
提前致谢。
编辑:根据要求 dput()
的数据,我知道有一种简单的方法可以将数据转储出来供其他人使用,但我一辈子都记不起它是什么,曾经查看 reprex
包以帮助但没有成功...
> dput(fuel_sheet %>% dplyr::select(vehicle, date, mileage))
structure(list(vehicle = structure(c(3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L,
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Ford Focus",
"Honda Jazz", "VW T4 Campervan"), class = "factor"), date = structure(c(16708,
16724, 16726, 16775, 16822, 16824, 16825, 16852, 16889, 16907,
16914, 16915, 16915, 16973, 17009, 17011, 17015, 17025, 17047,
17092, 17117, 17117, 17144, 17146, 17173, 17237, 17284, 17295,
17319, 17333, 17361, 17423, 17434, 17453, 17466, 17558, 17579,
17634, 17635, 17641, 17642, 17710, 17743, 17807, 17807, 17929,
17995, 17999, 18004, 18042, 18074, 18077, 18106, 18129, 18139,
18223, 18227, 18337, 18469, 18470, 18493, 18501, 18505, 18509
), class = "Date"), mileage = c(150631, 150866, 151395, 151731,
151968, 152360, 152731, 153066, 153500, 153751, 154363, 154619,
154866, 155473, 156008, 156199, 156637, 156997, 157157, 61476,
61774, 157458, 62175, 62409, 63209, 157769, 158325, 158803, 158956,
159314, 159706, 160126, 160550, 160986, 161363, 162000, 162449,
162879, 163215, 163636, 164005, 164613, 164802, 165352, 165651,
165946, 166488, 166987, 167131, 167749, 168013, 168361, 168853,
169371, 169996, 170636, 170640, 170954, 171573, 171763, 172286,
172616, 173105, 173266)), row.names = c(NA, -64L), groups = structure(list(
vehicle = structure(c(1L, 3L), .Label = c("Ford Focus", "Honda Jazz",
"VW T4 Campervan"), class = "factor"), .rows = structure(list(
c(20L, 21L, 23L, 24L, 25L), c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 22L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L,
35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L,
46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L,
57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = 1:2, class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
编辑 2:省略 order_by
没有帮助...
> fuel_sheet %>% group_by(vehicle) %>% mutate(miles = mileage - lag(mileage, n=1))
# A tibble: 64 x 10
# Groups: vehicle [2]
vehicle mileage litre gbp station date month year miles_lag miles
<fct> <dbl> <dbl> <dbl> <fct> <date> <dbl> <dbl> <dbl> <dbl>
1 VW T4 Campervan 150631 78.0 81.9 Tescos 2015-09-30 9 2015 150631 0
2 VW T4 Campervan 150866 33.6 36.9 Nisa 2015-10-16 10 2015 150866 0
3 VW T4 Campervan 151395 66 72.5 Tescos 2015-10-18 10 2015 151395 0
4 VW T4 Campervan 151731 44 44.4 Morrisons 2015-12-06 12 2015 151731 0
5 VW T4 Campervan 151968 34.1 34 Other 2016-01-22 1 2016 151968 0
6 VW T4 Campervan 152360 49.1 48.1 Morrisons 2016-01-24 1 2016 152360 0
7 VW T4 Campervan 152731 53.2 52.0 Tescos 2016-01-25 1 2016 152731 0
8 VW T4 Campervan 153066 44.0 43 Tescos 2016-02-21 2 2016 153066 0
9 VW T4 Campervan 153500 56.9 58 Tescos 2016-03-29 3 2016 153500 0
10 VW T4 Campervan 153751 27.6 29 Tescos 2016-04-16 4 2016 153751 0
# … with 54 more rows
这可能是一个非常幼稚的解决方案,但它确实有效
df[,'mileage_lag']=c(0,diff(df$mileage))
编辑
library(data.table)
dt=data.table(df)
dt[,'mileage_lag':=c(0,diff(mileage)),by=.(vehicles)]
在你的冲突中
stats::lag() masks dplyr::lag()
如果您在两个 mutate
中指定 dplyr::lag()
,它应该可以工作。 (我检查了指定 stats::lag()
我得到的输出与你的相同。)