R - 从数据集(波士顿住房数据集)中查找 1 个观测值的所有特征的百分位数
R - Find percentiles of all the features for 1 of the observations from a dataset (Boston Housing Dataset)
我正在研究波士顿住房数据集。我过滤了具有最低 'medv' 的观察结果(城镇),并在转换为新数据帧后保存了它们。我想在此新数据框中插入列,其中包含基于这些过滤观察的特征值的原始数据的百分位数。
这是 R 代码:
# load the library containing the dataset
library(MASS)
# save the data with custom name
boston = Boston
# suburb with lowest medv
low.medv = data.frame(t(boston[boston$medv == min(boston$medv),]))
low.medv
# The values I want populated in new columns:
# Finding percentile rank for crim
ecdf(boston$crim)(38.3518)
# >>> 0.9881423
ecdf(boston$crim)(67.9208)
# >>> 0.9960474
# percentile rank for lstat
ecdf(boston$lstat)(30.59)
# >>> 0.9782609
ecdf(boston$lstat)(22.98)
# >>> 0.8992095
期望输出 :
有没有办法在 sapply 中使用 ecdf 函数?
我认为如果你不事先转置数据会更容易:
low.medv <- boston[boston$medv == min(boston$medv),]
res <- mapply(function(x, y) ecdf(x)(y), boston, low.medv)
res
# crim zn indus chas nox rm age dis rad
#[1,] 0.9881 0.7352 0.8874 0.9308 0.8577 0.07708 1 0.05731 1
#[2,] 0.9960 0.7352 0.8874 0.9308 0.8577 0.13636 1 0.04150 1
# tax ptratio black lstat medv
#[1,] 0.9901 0.8893 1.0000 0.9783 0.003953
#[2,] 0.9901 0.8893 0.3498 0.8992 0.003953
现在,如果您想要 4 列所示的结果,您可以这样做:
cbind(t(low.medv), t(res))
我正在研究波士顿住房数据集。我过滤了具有最低 'medv' 的观察结果(城镇),并在转换为新数据帧后保存了它们。我想在此新数据框中插入列,其中包含基于这些过滤观察的特征值的原始数据的百分位数。 这是 R 代码:
# load the library containing the dataset
library(MASS)
# save the data with custom name
boston = Boston
# suburb with lowest medv
low.medv = data.frame(t(boston[boston$medv == min(boston$medv),]))
low.medv
# The values I want populated in new columns:
# Finding percentile rank for crim
ecdf(boston$crim)(38.3518)
# >>> 0.9881423
ecdf(boston$crim)(67.9208)
# >>> 0.9960474
# percentile rank for lstat
ecdf(boston$lstat)(30.59)
# >>> 0.9782609
ecdf(boston$lstat)(22.98)
# >>> 0.8992095
期望输出 :
有没有办法在 sapply 中使用 ecdf 函数?
我认为如果你不事先转置数据会更容易:
low.medv <- boston[boston$medv == min(boston$medv),]
res <- mapply(function(x, y) ecdf(x)(y), boston, low.medv)
res
# crim zn indus chas nox rm age dis rad
#[1,] 0.9881 0.7352 0.8874 0.9308 0.8577 0.07708 1 0.05731 1
#[2,] 0.9960 0.7352 0.8874 0.9308 0.8577 0.13636 1 0.04150 1
# tax ptratio black lstat medv
#[1,] 0.9901 0.8893 1.0000 0.9783 0.003953
#[2,] 0.9901 0.8893 0.3498 0.8992 0.003953
现在,如果您想要 4 列所示的结果,您可以这样做:
cbind(t(low.medv), t(res))