在 R 中,如何创建一个列来显示数据的百分位数是不同列中的值
In R, how can I create a column that displays what percentile of the data is a value in a different column
所以我的数据看起来像这样,行数大约为 6000。
pidp avgy06
1 68160489 20182.36849
2 68575973 13845.49024
3 69180553 35.61806
4 69786365 13117.26465
5 69815605 15791.40283
6 69833973 10327.94531
我想知道是否有办法应用 quantile()
函数,以便添加另一列,让我知道 avgy06
的每个值是数据的哪个百分位数。例如,这是 avgy06
的 100 个百分位数:
0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 189.0078
11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21%
790.2671 1505.4875 2364.4903 2900.0230 3441.0689 3680.2787 4246.6805 4595.0131 4704.8372 4904.6381 5217.9201
22% 23% 24% 25% 26% 27% 28% 29% 30% 31% 32%
5421.2263 5621.4581 6166.7022 6673.1660 6851.0085 7261.1324 7588.7569 7947.6250 8292.3789 8606.2774 8938.2232
33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43%
9286.9695 9665.7901 9885.2171 10035.7984 10280.0676 10423.1376 10633.2589 10886.2913 11205.7540 11411.0259 11581.6681
44% 45% 46% 47% 48% 49% 50% 51% 52% 53% 54%
11763.5549 11926.4006 12210.2935 12434.3433 12581.4526 12781.9956 13135.6904 13305.6350 13666.1352 13814.4657 14046.4000
55% 56% 57% 58% 59% 60% 61% 62% 63% 64% 65%
14258.2219 14431.6258 14631.6608 14940.7309 15168.2559 15385.1055 15583.7370 15757.0793 15906.4169 16094.3642 16448.5898
66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76%
16683.5195 16817.0613 17049.2498 17361.5975 17663.5911 18004.6763 18309.8879 18614.3184 18871.4102 19220.2478 19529.0051
77% 78% 79% 80% 81% 82% 83% 84% 85% 86% 87%
19962.8668 20249.0984 20526.2794 20690.6686 20896.2913 21135.7998 21396.8414 21763.6818 22070.5915 22494.2696 23000.0000
88% 89% 90% 91% 92% 93% 94% 95% 96% 97% 98%
23486.0340 24206.6486 25106.3743 26261.0410 26593.7715 27402.7684 28079.6456 28910.4655 30315.5573 32447.8075 39225.6094
99% 100%
41759.9540 57456.0758
我想在我的数据框中添加一个额外的列来读取 avgy06
和 returns 的值,这些值对应于数据的哪个百分位数(例如第 75、63...) .如果有不使用 quantile()
函数的其他方法,请告诉我。
非常感谢!
我将使用另一个数据集进行说明。您正在寻找的是 经验累积分布函数 或 ecdf
.
data(iris) #data for illustration
data_ecdf <- ecdf(iris[, 'Sepal.Length'])
iris[, 'Sepal.Length.Percentile'] <- data_ecdf(iris[, 'Sepal.Length'])
head(iris)
# Output:
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length.Percentile
# 1 5.1 3.5 1.4 0.2 setosa 0.27333333
# 2 4.9 3.0 1.4 0.2 setosa 0.14666667
# 3 4.7 3.2 1.3 0.2 setosa 0.07333333
# 4 4.6 3.1 1.5 0.2 setosa 0.06000000
# 5 5.0 3.6 1.4 0.2 setosa 0.21333333
# 6 5.4 3.9 1.7 0.4 setosa 0.34666667
所以我的数据看起来像这样,行数大约为 6000。
pidp avgy06
1 68160489 20182.36849
2 68575973 13845.49024
3 69180553 35.61806
4 69786365 13117.26465
5 69815605 15791.40283
6 69833973 10327.94531
我想知道是否有办法应用 quantile()
函数,以便添加另一列,让我知道 avgy06
的每个值是数据的哪个百分位数。例如,这是 avgy06
的 100 个百分位数:
0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 189.0078
11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21%
790.2671 1505.4875 2364.4903 2900.0230 3441.0689 3680.2787 4246.6805 4595.0131 4704.8372 4904.6381 5217.9201
22% 23% 24% 25% 26% 27% 28% 29% 30% 31% 32%
5421.2263 5621.4581 6166.7022 6673.1660 6851.0085 7261.1324 7588.7569 7947.6250 8292.3789 8606.2774 8938.2232
33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43%
9286.9695 9665.7901 9885.2171 10035.7984 10280.0676 10423.1376 10633.2589 10886.2913 11205.7540 11411.0259 11581.6681
44% 45% 46% 47% 48% 49% 50% 51% 52% 53% 54%
11763.5549 11926.4006 12210.2935 12434.3433 12581.4526 12781.9956 13135.6904 13305.6350 13666.1352 13814.4657 14046.4000
55% 56% 57% 58% 59% 60% 61% 62% 63% 64% 65%
14258.2219 14431.6258 14631.6608 14940.7309 15168.2559 15385.1055 15583.7370 15757.0793 15906.4169 16094.3642 16448.5898
66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76%
16683.5195 16817.0613 17049.2498 17361.5975 17663.5911 18004.6763 18309.8879 18614.3184 18871.4102 19220.2478 19529.0051
77% 78% 79% 80% 81% 82% 83% 84% 85% 86% 87%
19962.8668 20249.0984 20526.2794 20690.6686 20896.2913 21135.7998 21396.8414 21763.6818 22070.5915 22494.2696 23000.0000
88% 89% 90% 91% 92% 93% 94% 95% 96% 97% 98%
23486.0340 24206.6486 25106.3743 26261.0410 26593.7715 27402.7684 28079.6456 28910.4655 30315.5573 32447.8075 39225.6094
99% 100%
41759.9540 57456.0758
我想在我的数据框中添加一个额外的列来读取 avgy06
和 returns 的值,这些值对应于数据的哪个百分位数(例如第 75、63...) .如果有不使用 quantile()
函数的其他方法,请告诉我。
非常感谢!
我将使用另一个数据集进行说明。您正在寻找的是 经验累积分布函数 或 ecdf
.
data(iris) #data for illustration
data_ecdf <- ecdf(iris[, 'Sepal.Length'])
iris[, 'Sepal.Length.Percentile'] <- data_ecdf(iris[, 'Sepal.Length'])
head(iris)
# Output:
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length.Percentile
# 1 5.1 3.5 1.4 0.2 setosa 0.27333333
# 2 4.9 3.0 1.4 0.2 setosa 0.14666667
# 3 4.7 3.2 1.3 0.2 setosa 0.07333333
# 4 4.6 3.1 1.5 0.2 setosa 0.06000000
# 5 5.0 3.6 1.4 0.2 setosa 0.21333333
# 6 5.4 3.9 1.7 0.4 setosa 0.34666667