如何向量化 R 中的 ecdf 函数?

How do I vectorize the ecdf function in R?

我有一个如下所示的数据框:

set.seed(42)
data <- runif(1000)    
utility <- sample(c("abc","bcd","cde","def"),1000,replace=TRUE)
stage <- sample(c("vwx","wxy","xyz"),1000,replace=TRUE)
x <- data.frame(data,utility,stage)
head(x)
   data utility stage
1 0.9148060     def   xyz
2 0.9370754     abc   wxy
3 0.2861395     def   xyz
4 0.8304476     cde   xyz
5 0.6417455     bcd   xyz
6 0.5190959     abc   xyz

并且我想为效用和阶段的独特组合生成累积分布函数。在我的实际应用程序中,我最终会生成大约 100 个 cdf,但这个随机数据将有 12 (4x3) 个独特的组合。但我将使用这些 cdf 数千次,所以我不想每次都在运行中计算 cdf。 ecdf() 函数完全按照我的意愿工作,除了我需要对其进行矢量化。以下代码不起作用,但它是我正在尝试做的事情的要点:

ecdf_multiple <- function(x)
{
    i=0
    utilities <- levels(x$utilities)
    stages <- levels(x$stages)
    for(utility in utilities)
    {
        for(stage in stages)
        {
            i <- i + 1
            y <- ecdf(x[x$utilities == utility & x$stage == stage,1])
            # calculate ecdf for the unique util/stage combo
            z[i] <- list(y,utility,stage)
            # then assign it to a data element (list, data frame, json, whatever) note-this doesn't actually work
        }
    }
    z # return value
}

所以在 运行 ecdf_multiple 并将其分配给一个变量之后,我将通过传递一个值(我想要 cdf)、实用程序和阶段以某种方式引用该变量。

有没有一种方法可以向量化 ecdf 函数(或 use/build 另一个),这样我就可以多次输出而不需要一遍又一遍地生成分布?

--------添加以回应@Pascal 的优秀建议。--------

如何将其扩展到采用 "n" 类别维度的更一般情况?这是我的尝试,基于 Pascal 的二维案例。请注意我是如何尝试分配 "y":

set.seed(42)
data <- runif(1000)    
utility <- sample(c("abc","bcd","cde","def"),1000,replace=TRUE)
stage <- sample(c("vwx","wxy","xyz"),1000,replace=TRUE)
openclose <- sample(c("open","close"),1000,replace=TRUE)
x <- data.frame(data,utility,stage,openclose)
numlabels <- length(names(x))-1
y <- split(x, list(x[,2:(numlabels+1)]))
l <- lapply(y,function(x) ecdf(x[,"data"]))

#execute
utility <- "abc"
stage <- "xyz"
openclose <- "close"
comb <- paste(utility, stage, openclose, sep = ".")
# call the function
l[[comb]](.25)

在上面 "y" 的赋值过程中,我得到这个错误消息:

"Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?"

以下内容可能会有所帮助:

# we create a list of criteria by excluding 
# the first column of the data.frame
y <- split(x, as.list(x[,-1]))
l <- lapply(y, function(x) ecdf(x[,"data"]))

utility <- "abc"
stage <- "xyz"
comb <- paste(utility, stage, sep = ".")    

l[[comb]](0.25)
# [1] 0.2613636
plot(l[[comb]])