R - 从数据集（波士顿住房数据集）中查找 1 个观测值的所有特征的百分位数

Question

我正在研究波士顿住房数据集。我过滤了具有最低 'medv' 的观察结果（城镇），并在转换为新数据帧后保存了它们。我想在此新数据框中插入列，其中包含基于这些过滤观察的特征值的原始数据的百分位数。这是 R 代码：

# load the library containing the dataset
library(MASS)

# save the data with custom name
boston = Boston

# suburb with lowest medv
low.medv = data.frame(t(boston[boston$medv == min(boston$medv),]))
low.medv

# The values I want populated in new columns:

# Finding percentile rank for crim
ecdf(boston$crim)(38.3518)
# >>> 0.9881423
ecdf(boston$crim)(67.9208)
# >>> 0.9960474

# percentile rank for lstat
ecdf(boston$lstat)(30.59)
# >>> 0.9782609
ecdf(boston$lstat)(22.98)
# >>> 0.8992095

期望输出 :

有没有办法在 sapply 中使用 ecdf 函数？

Answer 1

我认为如果你不事先转置数据会更容易:

low.medv <- boston[boston$medv == min(boston$medv),]
res <- mapply(function(x, y) ecdf(x)(y), boston, low.medv)
res
#       crim     zn  indus   chas    nox      rm age     dis rad
#[1,] 0.9881 0.7352 0.8874 0.9308 0.8577 0.07708   1 0.05731   1
#[2,] 0.9960 0.7352 0.8874 0.9308 0.8577 0.13636   1 0.04150   1
#        tax ptratio  black  lstat     medv
#[1,] 0.9901  0.8893 1.0000 0.9783 0.003953
#[2,] 0.9901  0.8893 0.3498 0.8992 0.003953

现在，如果您想要 4 列所示的结果，您可以这样做：

cbind(t(low.medv), t(res))

R - 从数据集（波士顿住房数据集）中查找 1 个观测值的所有特征的百分位数

R - Find percentiles of all the features for 1 of the observations from a dataset (Boston Housing Dataset)

r

vectorization

percentile

dataframe