r - 使用数据框列中的下一个非 na 值进行计算
r - calculate using next non-na value in data frame column
我在数据框中有一些数据,我想计算 month
值之间的百分比变化。问题是我在某些条目中有 NA
,它会抛出计算。
irm code price pct.change
1 201807 511130F075A04 4.6600 2.192982
2 201806 511130F075A04 4.5600 1.333333
3 201805 511130F075A04 4.5000 -13.461538
4 201804 511130F075A04 5.2000 NA
5 201803 511130F075A04 NA NA
6 201802 511130F075A04 4.9100 1.867220
7 201801 511130F075A04 4.8200 -5.304519
8 201712 511130F075A04 5.0900 2.414487
9 201711 511130F075A04 4.9700 -3.307393
10 201710 511130F075A04 5.1400 NA
11 201709 511130F075A04 NA NA
12 201708 511130F075A04 5.2900 2.918288
13 201707 511130F075A04 5.1400 66.553255
14 201706 511130F075A04 3.0861 -10.664351
15 201705 511130F075A04 3.4545 -7.241824
问题出在 pct.change
列的第 4 行和第 10 行。它们是 NA
,但我希望使用 price
的最新值而不是 NA
来计算它们。所需的输出将是(参见第 4 行和第 10 行):
irm code price pct.change
1 201807 511130F075A04 4.6600 2.192982
2 201806 511130F075A04 4.5600 1.333333
3 201805 511130F075A04 4.5000 -13.461538
**4 201804 511130F075A04 5.2000 5.906314**
5 201803 511130F075A04 NA NA
6 201802 511130F075A04 4.9100 1.867220
7 201801 511130F075A04 4.8200 -5.304519
8 201712 511130F075A04 5.0900 2.414487
9 201711 511130F075A04 4.9700 -3.307393
**10 201710 511130F075A04 5.1400 -2.835539**
11 201709 511130F075A04 NA NA
12 201708 511130F075A04 5.2900 2.918288
13 201707 511130F075A04 5.1400 66.553255
14 201706 511130F075A04 3.0861 -10.664351
15 201705 511130F075A04 3.4545 -7.241824
我已经尝试过标准 (x/lead(x) - 1)*100
和使用 (x/lag(which(!is.na(lead(x))
的几种变体,但我似乎遗漏了一些东西。在 base
甚至 dplyr
中是否有直接的方法来做到这一点? 我不想更换 NA,我想保留它们。
@LAP 的评论可能是最好的方法。 data.table
的语法稍微好一点
library(data.table)
setDT(df)
df[!is.na(price), pct.change := 100*(price/shift(price, type = 'lead') - 1)]
# irm code price pct.change
# 1: 201807 511130F075A04 4.6600 2.192982
# 2: 201806 511130F075A04 4.5600 1.333333
# 3: 201805 511130F075A04 4.5000 -13.461538
# 4: 201804 511130F075A04 5.2000 5.906314
# 5: 201803 511130F075A04 NA NA
# 6: 201802 511130F075A04 4.9100 1.867220
# 7: 201801 511130F075A04 4.8200 -5.304519
# 8: 201712 511130F075A04 5.0900 2.414487
# 9: 201711 511130F075A04 4.9700 -3.307393
# 10: 201710 511130F075A04 5.1400 -2.835539
# 11: 201709 511130F075A04 NA NA
# 12: 201708 511130F075A04 5.2900 2.918288
# 13: 201707 511130F075A04 5.1400 66.553255
# 14: 201706 511130F075A04 3.0861 -10.664351
# 15: 201705 511130F075A04 3.4545 NA
在 Base R 中你可以决定替换:
a = which(is.na(df$price))-1
transform(df,pct.change=replace(pct.change,a,100*(price[a]/price[a+2]-1)))
irm code price pct.change
1 201807 511130F075A04 4.6600 2.192982
2 201806 511130F075A04 4.5600 1.333333
3 201805 511130F075A04 4.5000 -13.461538
4 201804 511130F075A04 5.2000 5.906314
5 201803 511130F075A04 NA NA
6 201802 511130F075A04 4.9100 1.867220
7 201801 511130F075A04 4.8200 -5.304519
8 201712 511130F075A04 5.0900 2.414487
9 201711 511130F075A04 4.9700 -3.307393
10 201710 511130F075A04 5.1400 -2.835539
11 201709 511130F075A04 NA NA
12 201708 511130F075A04 5.2900 2.918288
13 201707 511130F075A04 5.1400 66.553255
14 201706 511130F075A04 3.0861 -10.664351
15 201705 511130F075A04 3.4545 -7.241824
我在数据框中有一些数据,我想计算 month
值之间的百分比变化。问题是我在某些条目中有 NA
,它会抛出计算。
irm code price pct.change
1 201807 511130F075A04 4.6600 2.192982
2 201806 511130F075A04 4.5600 1.333333
3 201805 511130F075A04 4.5000 -13.461538
4 201804 511130F075A04 5.2000 NA
5 201803 511130F075A04 NA NA
6 201802 511130F075A04 4.9100 1.867220
7 201801 511130F075A04 4.8200 -5.304519
8 201712 511130F075A04 5.0900 2.414487
9 201711 511130F075A04 4.9700 -3.307393
10 201710 511130F075A04 5.1400 NA
11 201709 511130F075A04 NA NA
12 201708 511130F075A04 5.2900 2.918288
13 201707 511130F075A04 5.1400 66.553255
14 201706 511130F075A04 3.0861 -10.664351
15 201705 511130F075A04 3.4545 -7.241824
问题出在 pct.change
列的第 4 行和第 10 行。它们是 NA
,但我希望使用 price
的最新值而不是 NA
来计算它们。所需的输出将是(参见第 4 行和第 10 行):
irm code price pct.change
1 201807 511130F075A04 4.6600 2.192982
2 201806 511130F075A04 4.5600 1.333333
3 201805 511130F075A04 4.5000 -13.461538
**4 201804 511130F075A04 5.2000 5.906314**
5 201803 511130F075A04 NA NA
6 201802 511130F075A04 4.9100 1.867220
7 201801 511130F075A04 4.8200 -5.304519
8 201712 511130F075A04 5.0900 2.414487
9 201711 511130F075A04 4.9700 -3.307393
**10 201710 511130F075A04 5.1400 -2.835539**
11 201709 511130F075A04 NA NA
12 201708 511130F075A04 5.2900 2.918288
13 201707 511130F075A04 5.1400 66.553255
14 201706 511130F075A04 3.0861 -10.664351
15 201705 511130F075A04 3.4545 -7.241824
我已经尝试过标准 (x/lead(x) - 1)*100
和使用 (x/lag(which(!is.na(lead(x))
的几种变体,但我似乎遗漏了一些东西。在 base
甚至 dplyr
中是否有直接的方法来做到这一点? 我不想更换 NA,我想保留它们。
@LAP 的评论可能是最好的方法。 data.table
library(data.table)
setDT(df)
df[!is.na(price), pct.change := 100*(price/shift(price, type = 'lead') - 1)]
# irm code price pct.change
# 1: 201807 511130F075A04 4.6600 2.192982
# 2: 201806 511130F075A04 4.5600 1.333333
# 3: 201805 511130F075A04 4.5000 -13.461538
# 4: 201804 511130F075A04 5.2000 5.906314
# 5: 201803 511130F075A04 NA NA
# 6: 201802 511130F075A04 4.9100 1.867220
# 7: 201801 511130F075A04 4.8200 -5.304519
# 8: 201712 511130F075A04 5.0900 2.414487
# 9: 201711 511130F075A04 4.9700 -3.307393
# 10: 201710 511130F075A04 5.1400 -2.835539
# 11: 201709 511130F075A04 NA NA
# 12: 201708 511130F075A04 5.2900 2.918288
# 13: 201707 511130F075A04 5.1400 66.553255
# 14: 201706 511130F075A04 3.0861 -10.664351
# 15: 201705 511130F075A04 3.4545 NA
在 Base R 中你可以决定替换:
a = which(is.na(df$price))-1
transform(df,pct.change=replace(pct.change,a,100*(price[a]/price[a+2]-1)))
irm code price pct.change
1 201807 511130F075A04 4.6600 2.192982
2 201806 511130F075A04 4.5600 1.333333
3 201805 511130F075A04 4.5000 -13.461538
4 201804 511130F075A04 5.2000 5.906314
5 201803 511130F075A04 NA NA
6 201802 511130F075A04 4.9100 1.867220
7 201801 511130F075A04 4.8200 -5.304519
8 201712 511130F075A04 5.0900 2.414487
9 201711 511130F075A04 4.9700 -3.307393
10 201710 511130F075A04 5.1400 -2.835539
11 201709 511130F075A04 NA NA
12 201708 511130F075A04 5.2900 2.918288
13 201707 511130F075A04 5.1400 66.553255
14 201706 511130F075A04 3.0861 -10.664351
15 201705 511130F075A04 3.4545 -7.241824