循环或应用以在新列中为 df 中的每个现有列生成百分位值

Question

我想为每个现有列生成一个 "percentile in the distribution" 列。

不过，我不确定如何为单个系列生成此百分位数列。

#generate data
df <- data.frame(rnorm(100, 3, 1.2),
                     rnorm(100, 2, 0.5),
                     rnorm(100, 4, 1.5),
                     rnorm(100, 5, 0.2),
                     rnorm(100, 6, 0.7))
    colnames(df) <- c('a', 'b', 'c', 'd', 'e')

#failed attempt to generate new column
df$a_pct <- sapply(df$a, function(x) ecdf(x))

Answer 1

你必须使用ecdf吗？只要做：

sapply(df, function(x) rowMeans(outer(x, x, `>`)))

Answer 2

值returns的ecdf函数。

str(ecdf(df$a))
#function (v)  
#- attr(*, "class")= chr [1:3] "ecdf" "stepfun" "function"
#- attr(*, "call")= language ecdf(df$a)

要获得百分位数，请将函数应用于值，即

ecdf(df$a)(df$a)

对于多列，使用 lapply/sapply

遍历列

res1 <-  sapply(df, function(x) ecdf(x)(x))

循环或应用以在新列中为 df 中的每个现有列生成百分位值

loop or apply to generate percentile value in new column for each existing column in df

iteration

loops

r

percentile