(R)：通过唯一行值统一计算分位数

Question

我有一个这样的 df:

> df<-data.frame(Client.code = 
c(100451,100451,100523,100523,100523,100525),dayref = c(24,30,15,13,17,5))
> df
    Client.code dayref
1      100451     24
2      100451     30
3      100523     15
4      100523     13
5      100523     17
6      100525      5

自发行之日起，付款期为一年。

使用上面的数据并给出这样的 df2：

   Client.Code    Days
1  100451          16
1  100523          16
1  100460          35

因为我有足够的数据来计算合理的分位数概率。 calculations.I 想知道如何构建一个循环，以根据第一个 df 为天数 df2 中的每一行分配一个分位数。

Answer 1

我们可以使用data.table

library(data.table)
setDT(df)[, .(Quantile = quantile(dayref)), Client.code]

或者用tidyverse

library(dplyr)
library(tidyr)
df %>% 
   group_by(Client.code) %>%
   summarise(Quantile = list(quantile(dayref))) %>%
   unnest

Answer 2

tapply(df$dayref, df$Client.code, quantile)

您可以通过添加它们的向量来指定特定的百分位数

tapply(df$dayref, df$Client.code, quantile, 1:19/20)

您可能需要这样表述

tapply(df$dayref, df$Client.code, quantile, probs = 1:19/20)

如果你可能有 NAs

，你可以添加 na.rm = TRUE 作为另一个参数

(R)：通过唯一行值统一计算分位数

(R): Calculate quantile by unique row value unification

aggregate

r

unification

quantile