在R中计算前一年之间的余额值
Calculate balance value between year and year before in R
我正在尝试从如下所示的 df 中获取余额值
df1
Name Year Ch1 Origin
A 1995 x1 a
A 1996 x2 b
A 1997 x3 a
A 2000 x4 a
B 1997 y1 c
B 1998 y2 c
而 Ch1 是数字。我想添加一个额外的列来获得这个值:
Name Year Ch1 Bil
A 1995 x1
A 1996 x2 %
A 1997 x3 %
A 2000 x4 %
B 1997 y1
B 1998 y2 %
我希望 "Bil" 成为 Xi/Xi-1 (value from previous year divided by recent year) IF Xi>=Xi-1
并且
-Xi-1/Xi IF Xi<Xi-1
而 i 是年份,i-1 是前一年。
我知道我可以这样循环:
for (i in nrow(df1))
if (df[i,1]==df[i-1,1]) {
if (df[i,3]>df[i-1,3] {
df$Bil<-(df[i,3]/df[i-1,3])
} else df$Bil<-(-df[i-1,3]/df[i,3])
}
有没有更优雅或更快捷的计算方法?这样我真的需要确保数据集的顺序正确(从旧到近几年)。
我们可以使用 dplyr
中的 lag
。
library(dplyr)
df1 %>%
arrange(Year) %>%
group_by(Name) %>%
mutate(Bil = case_when(Ch1 >= lag(Ch1) ~ Ch1 / lag(Ch1),
Ch1 < lag(Ch1) ~ -lag(Ch1)/Ch1))
数据
df1 <- structure(list(Name = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), Year = c(1995L, 1996L, 1997L, 2000L,
1997L, 1998L), Ch1 = structure(1:6, .Label = c("x1", "x2", "x3",
"x4", "y1", "y2"), class = "factor"), Origin = structure(c(1L,
2L, 1L, 1L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
df1<-df1 %>% mutate(Ch1 = round(runif(n=6,100,1000),2))
这是使用 shift
的 data.table
方法。
library(data.table)
dat <- as.data.table(df1)
dat$value <- rnorm(6, 20, 1) #adding a numeric column
dat1 <- dat[order(Year)][,
Bil := ifelse(test = shift(x = value, n = 1, type = 'lag') > value,
yes = shift(x = value, n = 1, type = 'lag')/value,
no = value/shift(x = value, n = 1, type = 'lag'))]
> dat
Name Year Ch1 Origin value
1: A 1995 x1 a 19.23394
2: A 1996 x2 b 21.16079
3: A 1997 x3 a 20.87078
4: A 2000 x4 a 20.50770
5: B 1997 y1 c 20.39450
6: B 1998 y2 c 20.53281
> dat1
Name Year Ch1 Origin value Bil
1: A 1995 x1 a 19.23394 NA
2: A 1996 x2 b 21.16079 1.100179
3: A 1997 x3 a 20.87078 1.013895
4: B 1997 y1 c 20.39450 1.023353
5: B 1998 y2 c 20.53281 1.006782
6: A 2000 x4 a 20.50770 1.001224
我正在尝试从如下所示的 df 中获取余额值
df1
Name Year Ch1 Origin
A 1995 x1 a
A 1996 x2 b
A 1997 x3 a
A 2000 x4 a
B 1997 y1 c
B 1998 y2 c
而 Ch1 是数字。我想添加一个额外的列来获得这个值:
Name Year Ch1 Bil
A 1995 x1
A 1996 x2 %
A 1997 x3 %
A 2000 x4 %
B 1997 y1
B 1998 y2 %
我希望 "Bil" 成为 Xi/Xi-1 (value from previous year divided by recent year) IF Xi>=Xi-1
并且
-Xi-1/Xi IF Xi<Xi-1
而 i 是年份,i-1 是前一年。
我知道我可以这样循环:
for (i in nrow(df1))
if (df[i,1]==df[i-1,1]) {
if (df[i,3]>df[i-1,3] {
df$Bil<-(df[i,3]/df[i-1,3])
} else df$Bil<-(-df[i-1,3]/df[i,3])
}
有没有更优雅或更快捷的计算方法?这样我真的需要确保数据集的顺序正确(从旧到近几年)。
我们可以使用 dplyr
中的 lag
。
library(dplyr)
df1 %>%
arrange(Year) %>%
group_by(Name) %>%
mutate(Bil = case_when(Ch1 >= lag(Ch1) ~ Ch1 / lag(Ch1),
Ch1 < lag(Ch1) ~ -lag(Ch1)/Ch1))
数据
df1 <- structure(list(Name = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), Year = c(1995L, 1996L, 1997L, 2000L,
1997L, 1998L), Ch1 = structure(1:6, .Label = c("x1", "x2", "x3",
"x4", "y1", "y2"), class = "factor"), Origin = structure(c(1L,
2L, 1L, 1L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
df1<-df1 %>% mutate(Ch1 = round(runif(n=6,100,1000),2))
这是使用 shift
的 data.table
方法。
library(data.table)
dat <- as.data.table(df1)
dat$value <- rnorm(6, 20, 1) #adding a numeric column
dat1 <- dat[order(Year)][,
Bil := ifelse(test = shift(x = value, n = 1, type = 'lag') > value,
yes = shift(x = value, n = 1, type = 'lag')/value,
no = value/shift(x = value, n = 1, type = 'lag'))]
> dat
Name Year Ch1 Origin value
1: A 1995 x1 a 19.23394
2: A 1996 x2 b 21.16079
3: A 1997 x3 a 20.87078
4: A 2000 x4 a 20.50770
5: B 1997 y1 c 20.39450
6: B 1998 y2 c 20.53281
> dat1
Name Year Ch1 Origin value Bil
1: A 1995 x1 a 19.23394 NA
2: A 1996 x2 b 21.16079 1.100179
3: A 1997 x3 a 20.87078 1.013895
4: B 1997 y1 c 20.39450 1.023353
5: B 1998 y2 c 20.53281 1.006782
6: A 2000 x4 a 20.50770 1.001224