如何计算不同行数据之间的差异?
How to calculate difference between data in different rows?
我有这种格式的每月数据
PrecipMM Date
122.7 2004-01-01
54.2 2005-01-01
31.9 2006-01-01
100.5 2007-01-01
144.9 2008-01-01
96.4 2009-01-01
75.3 2010-01-01
94.8 2011-01-01
67.6 2012-01-01
93.0 2013-01-01
184.6 2014-01-01
101.0 2015-01-01
149.3 2016-01-01
50.2 2004-02-01
46.2 2005-02-01
57.7 2006-02-01
我想计算 precipMM
在不同年份的同月的所有差异。
我的梦想输出是这样的:
PrecipMM Date PrecipMM_diff
122.7 2004-01-01 NA
54.2 2005-01-01 -68.5
31.9 2006-01-01 -22.3
100.5 2007-01-01 68.6
144.9 2008-01-01 44.4
96.4 2009-01-01 -48.5
75.3 2010-01-01 -21.2
94.8 2011-01-01 19.5
67.6 2012-01-01 -27.2
93.0 2013-01-01 25.4
184.6 2014-01-01 91.6
101.0 2015-01-01 -83.6
149.3 2016-01-01 48.3
50.2 2004-02-01 NA
46.2 2005-02-01 -4.0
57.7 2006-02-01 11.5
我认为 diff()
可以做到这一点,但我不知道怎么做。
我认为您可以将 lag
与 dplyr
中的 group_by
结合使用。方法如下:
library(dplyr)
library(lubridate) # makes dealing with dates easier
# Load your example data
df <- structure(list(PrecipMM = c(4.4, 66.7, 48.2, 60.9, 108.1, 109.2,
101.7, 38.1, 53.8, 71.9, 75.4, 67.1, 92.7, 115.3, 68.9, 38.9),
Date = structure(5:20, .Label = c("101.7", "108.1", "109.2",
"115.3", "1766-01-01", "1766-02-01", "1766-03-01", "1766-04-01",
"1766-05-01", "1766-06-01", "1766-07-01", "1766-08-01", "1766-09-01",
"1766-10-01", "1766-11-01", "1766-12-01", "1767-01-01", "1767-02-01",
"1767-03-01", "1767-04-01", "38.1", "38.9", "4.4", "48.2",
"53.8", "60.9", "66.7", "67.1", "68.9", "71.9", "75.4", "92.7"
), class = "factor")), class = "data.frame", row.names = c(NA,
-16L), .Names = c("PrecipMM", "Date"))
results <- df %>%
mutate(years = year(Date), months = month(Date)) %>%
group_by(months) %>%
arrange(years) %>%
mutate(lagged.rain = lag(PrecipMM), rain.diff = PrecipMM - lagged.rain)
results
# Source: local data frame [16 x 6]
# Groups: months [12]
#
# PrecipMM Date years months lagged.rain rain.diff
# (dbl) (fctr) (dbl) (dbl) (dbl) (dbl)
# 1 4.4 1766-01-01 1766 1 NA NA
# 2 92.7 1767-01-01 1767 1 4.4 88.3
# 3 66.7 1766-02-01 1766 2 NA NA
# 4 115.3 1767-02-01 1767 2 66.7 48.6
# 5 48.2 1766-03-01 1766 3 NA NA
# 6 68.9 1767-03-01 1767 3 48.2 20.7
# 7 60.9 1766-04-01 1766 4 NA NA
# 8 38.9 1767-04-01 1767 4 60.9 -22.0
# 9 108.1 1766-05-01 1766 5 NA NA
# 10 109.2 1766-06-01 1766 6 NA NA
# 11 101.7 1766-07-01 1766 7 NA NA
# 12 38.1 1766-08-01 1766 8 NA NA
# 13 53.8 1766-09-01 1766 9 NA NA
# 14 71.9 1766-10-01 1766 10 NA NA
# 15 75.4 1766-11-01 1766 11 NA NA
# 16 67.1 1766-12-01 1766 12 NA NA
我有这种格式的每月数据
PrecipMM Date
122.7 2004-01-01
54.2 2005-01-01
31.9 2006-01-01
100.5 2007-01-01
144.9 2008-01-01
96.4 2009-01-01
75.3 2010-01-01
94.8 2011-01-01
67.6 2012-01-01
93.0 2013-01-01
184.6 2014-01-01
101.0 2015-01-01
149.3 2016-01-01
50.2 2004-02-01
46.2 2005-02-01
57.7 2006-02-01
我想计算 precipMM
在不同年份的同月的所有差异。
我的梦想输出是这样的:
PrecipMM Date PrecipMM_diff
122.7 2004-01-01 NA
54.2 2005-01-01 -68.5
31.9 2006-01-01 -22.3
100.5 2007-01-01 68.6
144.9 2008-01-01 44.4
96.4 2009-01-01 -48.5
75.3 2010-01-01 -21.2
94.8 2011-01-01 19.5
67.6 2012-01-01 -27.2
93.0 2013-01-01 25.4
184.6 2014-01-01 91.6
101.0 2015-01-01 -83.6
149.3 2016-01-01 48.3
50.2 2004-02-01 NA
46.2 2005-02-01 -4.0
57.7 2006-02-01 11.5
我认为 diff()
可以做到这一点,但我不知道怎么做。
我认为您可以将 lag
与 dplyr
中的 group_by
结合使用。方法如下:
library(dplyr)
library(lubridate) # makes dealing with dates easier
# Load your example data
df <- structure(list(PrecipMM = c(4.4, 66.7, 48.2, 60.9, 108.1, 109.2,
101.7, 38.1, 53.8, 71.9, 75.4, 67.1, 92.7, 115.3, 68.9, 38.9),
Date = structure(5:20, .Label = c("101.7", "108.1", "109.2",
"115.3", "1766-01-01", "1766-02-01", "1766-03-01", "1766-04-01",
"1766-05-01", "1766-06-01", "1766-07-01", "1766-08-01", "1766-09-01",
"1766-10-01", "1766-11-01", "1766-12-01", "1767-01-01", "1767-02-01",
"1767-03-01", "1767-04-01", "38.1", "38.9", "4.4", "48.2",
"53.8", "60.9", "66.7", "67.1", "68.9", "71.9", "75.4", "92.7"
), class = "factor")), class = "data.frame", row.names = c(NA,
-16L), .Names = c("PrecipMM", "Date"))
results <- df %>%
mutate(years = year(Date), months = month(Date)) %>%
group_by(months) %>%
arrange(years) %>%
mutate(lagged.rain = lag(PrecipMM), rain.diff = PrecipMM - lagged.rain)
results
# Source: local data frame [16 x 6]
# Groups: months [12]
#
# PrecipMM Date years months lagged.rain rain.diff
# (dbl) (fctr) (dbl) (dbl) (dbl) (dbl)
# 1 4.4 1766-01-01 1766 1 NA NA
# 2 92.7 1767-01-01 1767 1 4.4 88.3
# 3 66.7 1766-02-01 1766 2 NA NA
# 4 115.3 1767-02-01 1767 2 66.7 48.6
# 5 48.2 1766-03-01 1766 3 NA NA
# 6 68.9 1767-03-01 1767 3 48.2 20.7
# 7 60.9 1766-04-01 1766 4 NA NA
# 8 38.9 1767-04-01 1767 4 60.9 -22.0
# 9 108.1 1766-05-01 1766 5 NA NA
# 10 109.2 1766-06-01 1766 6 NA NA
# 11 101.7 1766-07-01 1766 7 NA NA
# 12 38.1 1766-08-01 1766 8 NA NA
# 13 53.8 1766-09-01 1766 9 NA NA
# 14 71.9 1766-10-01 1766 10 NA NA
# 15 75.4 1766-11-01 1766 11 NA NA
# 16 67.1 1766-12-01 1766 12 NA NA