如何索引 features() 函数以使用 R 中的 map() 函数迭代数据帧列表?
How to index the features() function to iterate over a list of data frames using map() function in R?
绘制我的土壤压实数据给出了一条向上凸的曲线。我需要确定最大 y 值和产生该最大值的 x 值。
'features' 程序包拟合数据的平滑样条曲线和 returns 样条曲线的特征,包括 y 最大值和临界 x 值。我很难在多个样本上迭代 features() 函数,这些样本包含在一个整洁的列表中。
似乎功能包在索引数据时遇到问题。当我只对一个样本使用数据时,代码工作正常,但当我尝试使用点占位符和方括号时,它会丢失对数据的跟踪。
下面的代码显示了此过程如何对一个示例正确工作,但对迭代不正确。
#load packages
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 3.6.3
#> Warning: package 'forcats' was built under R version 3.6.3
library(features)
#> Warning: package 'features' was built under R version 3.6.3
#> Loading required package: lokern
#> Warning: package 'lokern' was built under R version 3.6.3
# generate example data
df <- tibble(
sample = (rep(LETTERS[1:3], each=4)),
w = c(seq(0.08, 0.12, by=0.0125),
seq(0.09, 0.13, by=0.0125),
seq(0.10, 0.14, by=0.0125)),
d= c(1.86, 1.88, 1.88, 1.87,
1.90, 1.92, 1.92, 1.91,
1.96, 1.98, 1.98, 1.97) )
df
#> # A tibble: 12 x 3
#> sample w d
#> <chr> <dbl> <dbl>
#> 1 A 0.08 1.86
#> 2 A 0.0925 1.88
#> 3 A 0.105 1.88
#> 4 A 0.118 1.87
#> 5 B 0.09 1.9
#> 6 B 0.102 1.92
#> 7 B 0.115 1.92
#> 8 B 0.128 1.91
#> 9 C 0.1 1.96
#> 10 C 0.112 1.98
#> 11 C 0.125 1.98
#> 12 C 0.138 1.97
# use the 'features' package to fit a smooth spline and extract the spline features,
# including local y-maximum and critical point along x-axis.
# This works fine for one sample at a time:
sample1_data <- df %>% filter(sample == 'A')
sample1_features <- features(x= sample1_data$w,
y= sample1_data$d,
smoother = "smooth.spline")
sample1_features
#> $f
#> fmean fmin fmax fsd noise
#> 1.880000e+00 1.860000e+00 1.880000e+00 1.000000e-02 0.000000e+00
#> snr d1min d1max fwiggle ncpts
#> 2.707108e+11 -9.100000e-01 1.970000e+00 9.349000e+01 1.000000e+00
#>
#> $cpts
#> [1] 0.1
#>
#> $curvature
#> [1] -121.03
#>
#> $outliers
#> [1] NA
#>
#> attr(,"fits")
#> attr(,"fits")$x
#> [1] 0.0800 0.0925 0.1050 0.1175
#>
#> attr(,"fits")$y
#> [1] 1.86 1.88 1.88 1.87
#>
#> attr(,"fits")$fn
#> [1] 1.86 1.88 1.88 1.87
#>
#> attr(,"fits")$d1
#> [1] 1.9732965 0.8533784 -0.5868100 -0.9061384
#>
#> attr(,"fits")$d2
#> [1] 4.588832e-03 -1.791915e+02 -5.123866e+01 1.461069e-01
#>
#> attr(,"class")
#> [1] "features"
# But when attempting to use the pipe and the map() function
# to iterate over a list containing data for multiple samples,
# using the typical map() placeholder dot will not index to the
# list element/columns that are being passed to .f
df_split <- split(df, f= df[['sample']])
df_split
#> $A
#> # A tibble: 4 x 3
#> sample w d
#> <chr> <dbl> <dbl>
#> 1 A 0.08 1.86
#> 2 A 0.0925 1.88
#> 3 A 0.105 1.88
#> 4 A 0.118 1.87
#>
#> $B
#> # A tibble: 4 x 3
#> sample w d
#> <chr> <dbl> <dbl>
#> 1 B 0.09 1.9
#> 2 B 0.102 1.92
#> 3 B 0.115 1.92
#> 4 B 0.128 1.91
#>
#> $C
#> # A tibble: 4 x 3
#> sample w d
#> <chr> <dbl> <dbl>
#> 1 C 0.1 1.96
#> 2 C 0.112 1.98
#> 3 C 0.125 1.98
#> 4 C 0.138 1.97
df_split %>% map(.f = features, x = .[['w']], y= .[['d']], smoother = "smooth.spline")
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#> Error in seq.default(min(x), max(x), length = max(npts, length(x))): 'from' must be a finite number
由 reprex package (v0.3.0)
于 2020-04-04 创建
您可以使用 group_split
基于 sample
拆分数据,并使用 map
将 features
函数应用于每个数据子集。
library(features)
library(dplyr)
library(purrr)
list_model <- df %>%
group_split(sample) %>%
map(~features(x = .x$w, y = .x$d, smoother = "smooth.spline"))
绘制我的土壤压实数据给出了一条向上凸的曲线。我需要确定最大 y 值和产生该最大值的 x 值。
'features' 程序包拟合数据的平滑样条曲线和 returns 样条曲线的特征,包括 y 最大值和临界 x 值。我很难在多个样本上迭代 features() 函数,这些样本包含在一个整洁的列表中。
似乎功能包在索引数据时遇到问题。当我只对一个样本使用数据时,代码工作正常,但当我尝试使用点占位符和方括号时,它会丢失对数据的跟踪。
下面的代码显示了此过程如何对一个示例正确工作,但对迭代不正确。
#load packages
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 3.6.3
#> Warning: package 'forcats' was built under R version 3.6.3
library(features)
#> Warning: package 'features' was built under R version 3.6.3
#> Loading required package: lokern
#> Warning: package 'lokern' was built under R version 3.6.3
# generate example data
df <- tibble(
sample = (rep(LETTERS[1:3], each=4)),
w = c(seq(0.08, 0.12, by=0.0125),
seq(0.09, 0.13, by=0.0125),
seq(0.10, 0.14, by=0.0125)),
d= c(1.86, 1.88, 1.88, 1.87,
1.90, 1.92, 1.92, 1.91,
1.96, 1.98, 1.98, 1.97) )
df
#> # A tibble: 12 x 3
#> sample w d
#> <chr> <dbl> <dbl>
#> 1 A 0.08 1.86
#> 2 A 0.0925 1.88
#> 3 A 0.105 1.88
#> 4 A 0.118 1.87
#> 5 B 0.09 1.9
#> 6 B 0.102 1.92
#> 7 B 0.115 1.92
#> 8 B 0.128 1.91
#> 9 C 0.1 1.96
#> 10 C 0.112 1.98
#> 11 C 0.125 1.98
#> 12 C 0.138 1.97
# use the 'features' package to fit a smooth spline and extract the spline features,
# including local y-maximum and critical point along x-axis.
# This works fine for one sample at a time:
sample1_data <- df %>% filter(sample == 'A')
sample1_features <- features(x= sample1_data$w,
y= sample1_data$d,
smoother = "smooth.spline")
sample1_features
#> $f
#> fmean fmin fmax fsd noise
#> 1.880000e+00 1.860000e+00 1.880000e+00 1.000000e-02 0.000000e+00
#> snr d1min d1max fwiggle ncpts
#> 2.707108e+11 -9.100000e-01 1.970000e+00 9.349000e+01 1.000000e+00
#>
#> $cpts
#> [1] 0.1
#>
#> $curvature
#> [1] -121.03
#>
#> $outliers
#> [1] NA
#>
#> attr(,"fits")
#> attr(,"fits")$x
#> [1] 0.0800 0.0925 0.1050 0.1175
#>
#> attr(,"fits")$y
#> [1] 1.86 1.88 1.88 1.87
#>
#> attr(,"fits")$fn
#> [1] 1.86 1.88 1.88 1.87
#>
#> attr(,"fits")$d1
#> [1] 1.9732965 0.8533784 -0.5868100 -0.9061384
#>
#> attr(,"fits")$d2
#> [1] 4.588832e-03 -1.791915e+02 -5.123866e+01 1.461069e-01
#>
#> attr(,"class")
#> [1] "features"
# But when attempting to use the pipe and the map() function
# to iterate over a list containing data for multiple samples,
# using the typical map() placeholder dot will not index to the
# list element/columns that are being passed to .f
df_split <- split(df, f= df[['sample']])
df_split
#> $A
#> # A tibble: 4 x 3
#> sample w d
#> <chr> <dbl> <dbl>
#> 1 A 0.08 1.86
#> 2 A 0.0925 1.88
#> 3 A 0.105 1.88
#> 4 A 0.118 1.87
#>
#> $B
#> # A tibble: 4 x 3
#> sample w d
#> <chr> <dbl> <dbl>
#> 1 B 0.09 1.9
#> 2 B 0.102 1.92
#> 3 B 0.115 1.92
#> 4 B 0.128 1.91
#>
#> $C
#> # A tibble: 4 x 3
#> sample w d
#> <chr> <dbl> <dbl>
#> 1 C 0.1 1.96
#> 2 C 0.112 1.98
#> 3 C 0.125 1.98
#> 4 C 0.138 1.97
df_split %>% map(.f = features, x = .[['w']], y= .[['d']], smoother = "smooth.spline")
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#> Error in seq.default(min(x), max(x), length = max(npts, length(x))): 'from' must be a finite number
由 reprex package (v0.3.0)
于 2020-04-04 创建您可以使用 group_split
基于 sample
拆分数据,并使用 map
将 features
函数应用于每个数据子集。
library(features)
library(dplyr)
library(purrr)
list_model <- df %>%
group_split(sample) %>%
map(~features(x = .x$w, y = .x$d, smoother = "smooth.spline"))