如何将方程式应用于考虑到 r 中数据帧的其他列的一列?
how to apply an equation to one column having in consideration other columns of a dataframe in r?
我的数据如下所示:
tibble [1,702,551 x 4] (S3: tbl_df/tbl/data.frame)
$ date : Date[1:1702551], format: "2011-04-12" "2011-04-12" ...
$ wlength: num [1:1702551] 350 351 352 353 354 355 356 357 358 359 ...
$ ID : chr [1:1702551] "c01" "c01" "c01" "c01" ...
$ R : num [1:1702551] 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 ...
head(fdata)
A tibble: 6 x 4
date wlength ID R
<date> <dbl> <chr> <dbl>
1 2011-04-12 350 c01 0.009
2 2011-04-12 351 c01 0.009
3 2011-04-12 352 c01 0.009
4 2011-04-12 353 c01 0.009
5 2011-04-12 354 c01 0.009
6 2011-04-12 355 c01 0.009
数据快速解释:
在 9 年中,通过年份(日期)收集了不同种类植被(ID)的反射率(波长)数据,例如“c01”,“h07”......相关的值为(R)。
我想应用归一化植被指数 (NDVI) 的公式:
(R800-R670)/(R800+R670)
R前面的数字是波长(wlength)。基本上对于每个“日期”和每个“ID”,我想在波长等于 800 和 670 时提取 R 的值并应用等式。
如何处理所有这些变量以便将此等式应用于我的数据?
如有任何帮助,我们将不胜感激。谢谢。
这里有一个使用 tidyverse 的可能性:
library(tidyverse)
fdata <-
tribble(
~date , ~wlength , ~ID , ~R,
"2011-04-12", 354 , "c01" , 0.022 ,
"2011-04-12", 800 , "c01" , 0.014,
"2011-04-12", 670 , "c01" , 0.009,
"2011-04-15", 355 , "h07" , 0.012,
"2011-04-15", 800 , "h07" , 0.003,
"2011-04-15", 670 , "h07" , 0.077
)
est_ndvi <-
fdata %>%
group_by(date, ID) %>%
filter(wlength %in% c(670, 800)) %>%
pivot_wider(names_from = wlength, names_prefix = "R", values_from = R) %>%
mutate(ndvi = (R800 - R670)/(R800 + R670))
不是很漂亮,但应该可以:
library(dplyr)
data <- tibble(
date = c("2020-01-01", "2020-01-01", "2020-01-02"),
wlength = c(800, 670, 800),
ID = c('c01', 'c01', 'c01'),
R = c(1, 2, 3))
data
reduced <- data %>%
filter(wlength %in% c(800, 670)) %>%
mutate(
R800 = ifelse(wlength == 800, R, NA),
R670 = ifelse(wlength == 670, R, NA)) %>%
group_by(date, ID) %>%
summarise(
R800 = max(R800, na.rm=TRUE),
R670 = max(R670, na.rm=TRUE),
NDVI = ((max(R800) - max(R670)) / (max(R800) + max(R670))))
reduced
首先,请参阅下面有关浮点相等性的注释。虽然这些数据可能不会影响您,但浮点相等过滤的一个问题是您可能不知道它正在发生,并且您的计算将不正确。
两种替代解决方案:
tidyverse,取 1
library(dplyr)
fdata %>%
arrange(-wlength) %>%
filter(wlength %in% c(352L, 350L)) %>%
group_by(date, ID) %>%
filter(n() == 2L) %>%
summarize(
quux = diff(R) / sum(R),
.groups = "drop"
)
# # A tibble: 4 x 3
# date ID quux
# <chr> <chr> <dbl>
# 1 2011-04-12 c01 -0.223
# 2 2011-04-12 c02 -0.152
# 3 2011-04-13 c01 -0.120
# 4 2011-04-13 c02 0.745
tidyverse,取 2
func <- function(wl, r, wavelengths = c(800, 670)) {
inds <- sapply(wavelengths, function(w) {
diffs <- abs(wl - w)
which(diffs < 1)[1]
})
diff(r[inds]) / sum(r[inds])
}
fdata %>%
group_by(date, ID) %>%
summarize(
quux = func(wlength, R, c(352, 350)),
.groups = "drop"
)
# # A tibble: 4 x 3
# date ID quux
# <chr> <chr> <dbl>
# 1 2011-04-12 c01 -0.223
# 2 2011-04-12 c02 -0.152
# 3 2011-04-13 c01 -0.120
# 4 2011-04-13 c02 0.745
浮点相等
您的 wlength
是一个 numeric
字段,用浮点数测试严格相等确实有其偶尔的风险。计算机在处理浮点数时有局限性(又名 double
、numeric
、float
)。这是计算机在处理非整数方面的一个基本限制。这不特定于任何一种编程语言。有一些附加库或包在任意精度数学方面做得更好,但我相信大多数主流语言(这是 relative/subjective,我承认)默认情况下不使用这些。参考:Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754.
integer
严格相等不是问题,在我的示例数据中它们是整数。您有几个选项来处理这个问题,通常是 %>%
-管道的 injecting/replacing 个组件。
转换为整数,
mutate(wlength = as.integer(wlength))
过滤器具有特定的公差,也许
filter(abs(wlength - 800) < 0.1 | abs(wlength - 670) < 0.1)
临时转换,
filter(sprintf("%0.0f", wlength) %in% c("800", "670"))
(不是最有效的,但有效并且可以支持非整数波长)。
数据
fdata <- read.table(header = TRUE, text = "
date wlength ID
2011-04-12 350 c01
2011-04-12 351 c01
2011-04-12 352 c01
2011-04-12 353 c01
2011-04-12 354 c01
2011-04-12 355 c01
2011-04-13 350 c01
2011-04-13 351 c01
2011-04-13 352 c01
2011-04-13 353 c01
2011-04-13 354 c01
2011-04-13 355 c01
2011-04-12 350 c02
2011-04-12 351 c02
2011-04-12 352 c02
2011-04-12 353 c02
2011-04-12 354 c02
2011-04-12 355 c02
2011-04-13 350 c02
2011-04-13 351 c02
2011-04-13 352 c02
2011-04-13 353 c02
2011-04-13 354 c02
2011-04-13 355 c02
")
set.seed(2021)
fdata$R <- round(runif(nrow(fdata)), 3)
我的数据如下所示:
tibble [1,702,551 x 4] (S3: tbl_df/tbl/data.frame)
$ date : Date[1:1702551], format: "2011-04-12" "2011-04-12" ...
$ wlength: num [1:1702551] 350 351 352 353 354 355 356 357 358 359 ...
$ ID : chr [1:1702551] "c01" "c01" "c01" "c01" ...
$ R : num [1:1702551] 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 ...
head(fdata)
A tibble: 6 x 4
date wlength ID R
<date> <dbl> <chr> <dbl>
1 2011-04-12 350 c01 0.009
2 2011-04-12 351 c01 0.009
3 2011-04-12 352 c01 0.009
4 2011-04-12 353 c01 0.009
5 2011-04-12 354 c01 0.009
6 2011-04-12 355 c01 0.009
数据快速解释: 在 9 年中,通过年份(日期)收集了不同种类植被(ID)的反射率(波长)数据,例如“c01”,“h07”......相关的值为(R)。
我想应用归一化植被指数 (NDVI) 的公式:
(R800-R670)/(R800+R670)
R前面的数字是波长(wlength)。基本上对于每个“日期”和每个“ID”,我想在波长等于 800 和 670 时提取 R 的值并应用等式。
如何处理所有这些变量以便将此等式应用于我的数据?
如有任何帮助,我们将不胜感激。谢谢。
这里有一个使用 tidyverse 的可能性:
library(tidyverse)
fdata <-
tribble(
~date , ~wlength , ~ID , ~R,
"2011-04-12", 354 , "c01" , 0.022 ,
"2011-04-12", 800 , "c01" , 0.014,
"2011-04-12", 670 , "c01" , 0.009,
"2011-04-15", 355 , "h07" , 0.012,
"2011-04-15", 800 , "h07" , 0.003,
"2011-04-15", 670 , "h07" , 0.077
)
est_ndvi <-
fdata %>%
group_by(date, ID) %>%
filter(wlength %in% c(670, 800)) %>%
pivot_wider(names_from = wlength, names_prefix = "R", values_from = R) %>%
mutate(ndvi = (R800 - R670)/(R800 + R670))
不是很漂亮,但应该可以:
library(dplyr)
data <- tibble(
date = c("2020-01-01", "2020-01-01", "2020-01-02"),
wlength = c(800, 670, 800),
ID = c('c01', 'c01', 'c01'),
R = c(1, 2, 3))
data
reduced <- data %>%
filter(wlength %in% c(800, 670)) %>%
mutate(
R800 = ifelse(wlength == 800, R, NA),
R670 = ifelse(wlength == 670, R, NA)) %>%
group_by(date, ID) %>%
summarise(
R800 = max(R800, na.rm=TRUE),
R670 = max(R670, na.rm=TRUE),
NDVI = ((max(R800) - max(R670)) / (max(R800) + max(R670))))
reduced
首先,请参阅下面有关浮点相等性的注释。虽然这些数据可能不会影响您,但浮点相等过滤的一个问题是您可能不知道它正在发生,并且您的计算将不正确。
两种替代解决方案:
tidyverse,取 1
library(dplyr)
fdata %>%
arrange(-wlength) %>%
filter(wlength %in% c(352L, 350L)) %>%
group_by(date, ID) %>%
filter(n() == 2L) %>%
summarize(
quux = diff(R) / sum(R),
.groups = "drop"
)
# # A tibble: 4 x 3
# date ID quux
# <chr> <chr> <dbl>
# 1 2011-04-12 c01 -0.223
# 2 2011-04-12 c02 -0.152
# 3 2011-04-13 c01 -0.120
# 4 2011-04-13 c02 0.745
tidyverse,取 2
func <- function(wl, r, wavelengths = c(800, 670)) {
inds <- sapply(wavelengths, function(w) {
diffs <- abs(wl - w)
which(diffs < 1)[1]
})
diff(r[inds]) / sum(r[inds])
}
fdata %>%
group_by(date, ID) %>%
summarize(
quux = func(wlength, R, c(352, 350)),
.groups = "drop"
)
# # A tibble: 4 x 3
# date ID quux
# <chr> <chr> <dbl>
# 1 2011-04-12 c01 -0.223
# 2 2011-04-12 c02 -0.152
# 3 2011-04-13 c01 -0.120
# 4 2011-04-13 c02 0.745
浮点相等
您的 wlength
是一个 numeric
字段,用浮点数测试严格相等确实有其偶尔的风险。计算机在处理浮点数时有局限性(又名 double
、numeric
、float
)。这是计算机在处理非整数方面的一个基本限制。这不特定于任何一种编程语言。有一些附加库或包在任意精度数学方面做得更好,但我相信大多数主流语言(这是 relative/subjective,我承认)默认情况下不使用这些。参考:Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754.
integer
严格相等不是问题,在我的示例数据中它们是整数。您有几个选项来处理这个问题,通常是 %>%
-管道的 injecting/replacing 个组件。
转换为整数,
mutate(wlength = as.integer(wlength))
过滤器具有特定的公差,也许
filter(abs(wlength - 800) < 0.1 | abs(wlength - 670) < 0.1)
临时转换,
filter(sprintf("%0.0f", wlength) %in% c("800", "670"))
(不是最有效的,但有效并且可以支持非整数波长)。
数据
fdata <- read.table(header = TRUE, text = "
date wlength ID
2011-04-12 350 c01
2011-04-12 351 c01
2011-04-12 352 c01
2011-04-12 353 c01
2011-04-12 354 c01
2011-04-12 355 c01
2011-04-13 350 c01
2011-04-13 351 c01
2011-04-13 352 c01
2011-04-13 353 c01
2011-04-13 354 c01
2011-04-13 355 c01
2011-04-12 350 c02
2011-04-12 351 c02
2011-04-12 352 c02
2011-04-12 353 c02
2011-04-12 354 c02
2011-04-12 355 c02
2011-04-13 350 c02
2011-04-13 351 c02
2011-04-13 352 c02
2011-04-13 353 c02
2011-04-13 354 c02
2011-04-13 355 c02
")
set.seed(2021)
fdata$R <- round(runif(nrow(fdata)), 3)