如何索引 features() 函数以使用 R 中的 map() 函数迭代数据帧列表?

How to index the features() function to iterate over a list of data frames using map() function in R?

绘制我的土壤压实数据给出了一条向上凸的曲线。我需要确定最大 y 值和产生该最大值的 x 值。

'features' 程序包拟合数据的平滑样条曲线和 returns 样条曲线的特征,包括 y 最大值和临界 x 值。我很难在多个样本上迭代 features() 函数,这些样本包含在一个整洁的列表中。

似乎功能包在索引数据时遇到问题。当我只对一个样本使用数据时,代码工作正常,但当我尝试使用点占位符和方括号时,它会丢失对数据的跟踪。

下面的代码显示了此过程如何对一个示例正确工作,但对迭代不正确。

#load packages
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 3.6.3
#> Warning: package 'forcats' was built under R version 3.6.3
library(features)
#> Warning: package 'features' was built under R version 3.6.3
#> Loading required package: lokern
#> Warning: package 'lokern' was built under R version 3.6.3

# generate example data 
df <- tibble(
  sample = (rep(LETTERS[1:3], each=4)),
  w =      c(seq(0.08, 0.12, by=0.0125), 
             seq(0.09, 0.13, by=0.0125), 
             seq(0.10, 0.14, by=0.0125)),
  d=      c(1.86, 1.88, 1.88, 1.87, 
            1.90, 1.92, 1.92, 1.91, 
            1.96, 1.98, 1.98, 1.97) )
df
#> # A tibble: 12 x 3
#>    sample      w     d
#>    <chr>   <dbl> <dbl>
#>  1 A      0.08    1.86
#>  2 A      0.0925  1.88
#>  3 A      0.105   1.88
#>  4 A      0.118   1.87
#>  5 B      0.09    1.9 
#>  6 B      0.102   1.92
#>  7 B      0.115   1.92
#>  8 B      0.128   1.91
#>  9 C      0.1     1.96
#> 10 C      0.112   1.98
#> 11 C      0.125   1.98
#> 12 C      0.138   1.97

# use the 'features' package to fit a smooth spline and extract the spline features, 
# including local y-maximum and critical point along x-axis.
# This works fine for one sample at a time:

sample1_data <- df %>% filter(sample == 'A')
sample1_features <- features(x= sample1_data$w, 
                             y= sample1_data$d, 
                             smoother = "smooth.spline")
sample1_features
#> $f
#>         fmean          fmin          fmax           fsd         noise 
#>  1.880000e+00  1.860000e+00  1.880000e+00  1.000000e-02  0.000000e+00 
#>           snr         d1min         d1max       fwiggle         ncpts 
#>  2.707108e+11 -9.100000e-01  1.970000e+00  9.349000e+01  1.000000e+00 
#> 
#> $cpts
#> [1] 0.1
#> 
#> $curvature
#> [1] -121.03
#> 
#> $outliers
#> [1] NA
#> 
#> attr(,"fits")
#> attr(,"fits")$x
#> [1] 0.0800 0.0925 0.1050 0.1175
#> 
#> attr(,"fits")$y
#> [1] 1.86 1.88 1.88 1.87
#> 
#> attr(,"fits")$fn
#> [1] 1.86 1.88 1.88 1.87
#> 
#> attr(,"fits")$d1
#> [1]  1.9732965  0.8533784 -0.5868100 -0.9061384
#> 
#> attr(,"fits")$d2
#> [1]  4.588832e-03 -1.791915e+02 -5.123866e+01  1.461069e-01
#> 
#> attr(,"class")
#> [1] "features"

# But when attempting to use the pipe and the map() function 
# to iterate over a list containing data for multiple samples, 
# using the typical map() placeholder dot will not index to the 
# list element/columns that are being passed to .f

df_split <- split(df, f= df[['sample']])
df_split
#> $A
#> # A tibble: 4 x 3
#>   sample      w     d
#>   <chr>   <dbl> <dbl>
#> 1 A      0.08    1.86
#> 2 A      0.0925  1.88
#> 3 A      0.105   1.88
#> 4 A      0.118   1.87
#> 
#> $B
#> # A tibble: 4 x 3
#>   sample     w     d
#>   <chr>  <dbl> <dbl>
#> 1 B      0.09   1.9 
#> 2 B      0.102  1.92
#> 3 B      0.115  1.92
#> 4 B      0.128  1.91
#> 
#> $C
#> # A tibble: 4 x 3
#>   sample     w     d
#>   <chr>  <dbl> <dbl>
#> 1 C      0.1    1.96
#> 2 C      0.112  1.98
#> 3 C      0.125  1.98
#> 4 C      0.138  1.97

df_split %>% map(.f = features, x = .[['w']], y= .[['d']], smoother = "smooth.spline")
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#> Error in seq.default(min(x), max(x), length = max(npts, length(x))): 'from' must be a finite number

reprex package (v0.3.0)

于 2020-04-04 创建

您可以使用 group_split 基于 sample 拆分数据,并使用 mapfeatures 函数应用于每个数据子集。

library(features)
library(dplyr)
library(purrr)

list_model <- df %>% 
               group_split(sample) %>% 
               map(~features(x = .x$w, y = .x$d, smoother = "smooth.spline"))