时间序列股票表现分组排名的 R 函数

R function for grouped ranking of stock performance for time-series

我有每月 return 的长格式。对于每个月,我想根据他们的表现将公司分成 10 个相同大小的组(十分位数)。

我的数据样本如下(此处按值升序排列,即每月 return):

print(R4, n=10)
# A tibble: 1,125 x 5
# Groups:   year [3]
   company                       year month value mktvalue
   <chr>                        <dbl> <dbl> <dbl>    <dbl>
 1 STIFEL FINANCIAL              1997     7 0.821     50.3
 2 RAYMOND JAMES FINL.           1997    10 0.833   1070. 
 3 CHARLES SCHWAB                1997     3 0.853   6250. 
 4 STATE STREET                  1997     3 0.863   6178. 
 5 STIFEL FINANCIAL              1996     7 0.871     31.6
 6 FRANKLIN RESOURCES            1997     3 0.872   7459. 
 7 BERKSHIRE HATHAWAY 'A'        1997     8 0.879  53857. 
 8 ALLIANCEBERNSTEIN HLDG. UNT.  1997     3 0.879   2257. 
 9 STATE STREET                  1997     8 0.890   8504. 
10 MORGAN STANLEY                1996     7 0.891   8764. 
# ... with 1,115 more rows

因为我要给每个月的公司分配排名,所以我先过滤了一个月:

R5 <- R4 %>%
  filter(year == 1997, month == 12, !is.na(value)) %>%
  arrange(value) %>%
  mutate(rank = rank(value))
            
print(R5)
# A tibble: 13 x 6
# Groups:   year [1]
   company                       year month value mktvalue  rank
   <chr>                        <dbl> <dbl> <dbl>    <dbl> <dbl>
 1 FRANKLIN RESOURCES            1997    12 0.967  11505.      1
 2 STATE STREET                  1997    12 0.978   9250.      2
 3 JEFFERIES FINANCIAL GROUP     1997    12 0.998   2175.      3
 4 BERKSHIRE HATHAWAY 'A'        1997    12 1.02   55184.      4
 5 BANK OF NEW YORK MELLON       1997    12 1.08   21152.      5
 6 CHARLES SCHWAB                1997    12 1.09   10854.      6
 7 MORGAN STANLEY                1997    12 1.09   32712.      7
 8 RAYMOND JAMES FINL.           1997    12 1.11    1207.      8
 9 MGIC INVESTMENT               1997    12 1.14    7030.      9
10 ALLIANCEBERNSTEIN HLDG. UNT.  1997    12 1.15    3107.     10
11 AFFILIATED MANAGERS           1997    12 1.16     373.     11
12 RADIAN GP.                    1997    12 1.16    1284.     12
13 STIFEL FINANCIAL              1997    12 1.17      81.4    13

我试过这个问题的答案来将公司分组: Grouped ranking in R

请告诉我是否有更聪明的方法来创建等分组。

percent.rank <- function(x) trunc(rank(x)/length(x)*100)
R6 <- within(R5, pr <- percent.rank(rank))
R6$decile <- cut(R6$pr, breaks = c(0,10,20,30,40,50,60,70,80,90,100), labels = c(1:10))
    
print(R6)
# A tibble: 13 x 8
# Groups:   year [1]
   company                       year month value mktvalue  rank    pr decile
   <chr>                        <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl> <fct> 
 1 FRANKLIN RESOURCES            1997    12 0.967  11505.      1     7 1     
 2 STATE STREET                  1997    12 0.978   9250.      2    15 2     
 3 JEFFERIES FINANCIAL GROUP     1997    12 0.998   2175.      3    23 3     
 4 BERKSHIRE HATHAWAY 'A'        1997    12 1.02   55184.      4    30 3     
 5 BANK OF NEW YORK MELLON       1997    12 1.08   21152.      5    38 4     
 6 CHARLES SCHWAB                1997    12 1.09   10854.      6    46 5     
 7 MORGAN STANLEY                1997    12 1.09   32712.      7    53 6     
 8 RAYMOND JAMES FINL.           1997    12 1.11    1207.      8    61 7     
 9 MGIC INVESTMENT               1997    12 1.14    7030.      9    69 7     
10 ALLIANCEBERNSTEIN HLDG. UNT.  1997    12 1.15    3107.     10    76 8     
11 AFFILIATED MANAGERS           1997    12 1.16     373.     11    84 9     
12 RADIAN GP.                    1997    12 1.16    1284.     12    92 10    
13 STIFEL FINANCIAL              1997    12 1.17      81.4    13   100 10 

到目前为止,我手动完成了一个月。 如何在不手动过滤的情况下将此应用到我的数据框的每个月和每年? 我的整个数据将从 1996 年 1 月到 2020 年 12 月。groups/deciles 必须大小相等,并根据月度表现(值)排名。有什么循环的方法或者其他聪明的方法吗?

重现我的数据:

R4 <- data.frame(company = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", 
                              "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L"), 
                  year = c(1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,1996, 1996, 1996, 
                           1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997), 
                  month = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), 
                  value = c(1.15999568000864, 0.983783783783784, 1.07692307692308, 1, 0.989795918367347, 
                            0.989690721649484, 1, 1.04166666666667, 0.99, 1.05050505050505, 1.07211538461538, 
                            1.01345291479821, 0.942477876106195, 1.06572769953052, 0.986784140969163, 
                            0.879464285714286, 1.11167512690355, 0.922374429223744, 1.15841584158416, 
                            1.18803418803419, 0.973018705035971, 1.10906058132519, 0.92,  1.00724637681159))

我不确定我是否完全按照您要执行的操作进行操作,但根据您的数据,我能够为每个 month/[=14 创建一个按 value 排序的分组=] 与此组合:

library(dplyr)
R4 %>% 
  group_by(year, month) %>% 
  mutate(rank = rank(value)) %>% 
  arrange(month, year, rank)

这产生:

# A tibble: 60 × 6
# Groups:   year, month [24]
   company                       year month value mktvalue  rank
   <fct>                        <dbl> <dbl> <dbl>    <dbl> <dbl>
 1 ALLIANCEBERNSTEIN HLDG. UNT.  1996     1 0.984    1802.     1
 2 BERKSHIRE HATHAWAY 'A'        1996     1 0.994   37088.     2
 3 BANK OF NEW YORK MELLON       1996     1 1.05     9517.     3
 4 ALLIANCEBERNSTEIN HLDG. UNT.  1997     1 1.07     2307.     1
 5 BANK OF NEW YORK MELLON       1997     1 1.08    13652.     2
 6 BANK OF NEW YORK MELLON       1996     2 1.01    10488.     1
 7 ALLIANCEBERNSTEIN HLDG. UNT.  1996     2 1.08     1931.     2
 8 BERKSHIRE HATHAWAY 'A'        1996     2 1.11    39584.     3
 9 ALLIANCEBERNSTEIN HLDG. UNT.  1997     2 0.987    2441.     1
10 BANK OF NEW YORK MELLON       1997     2 1.06    15457.     2