R在数据框中添加列,按特征进行季度排名
R add column in data frame with quarterly ranking by a characteristic
我想在我的数据框中添加一个按日期排序的新列(在本例中是季度,因此可以按月进行)。公司应按资产排名 quarter/month.
每个季度的公司(id)数量不同,可能有新的进入,也有旧的消失。
我想从这里开始
# id assets date
# 1 X1 50 1994-03-31
# 2 X2 120 1994-03-31
# 3 X3 530 1994-03-31
# 4 X4 24 1994-03-31
# 6 X3 57 1994-06-30
# 7 X1 445 1994-06-30
# 8 X10 525 1994-06-30
至此
# id assets date rank
# 1 X1 50 1994-03-31 3
# 2 X2 120 1994-03-31 2
# 3 X3 530 1994-03-31 1
# 4 X4 24 1994-03-31 4
# 6 X3 57 1994-06-30 3
# 7 X1 445 1994-06-30 2
# 8 X10 525 1994-06-30 1
我试过:
temp_asset_rank <- temp_asset_rank %>%
mutate(yearx = year(date)) %>%
mutate(month = month(date)) %>%
group_by(yearx, month) %>%
mutate(ranking = rank(temp_asset_rank$assets, na.last = NA, ties.method = c("average"))) %>%
ungroup()
但是 returns:
Error: Column `ranking` must be length 11788 (the group size) or one, not 1188563
如您所见,我的数据集实际上更大并且包含更多列。
改变
group_by(yearx, month)
到
group_by(yearx) %>%
group_by(month)
也不行
你能帮帮我吗?
基础 R 解决方案:
within(df[order(df$assets, decreasing = TRUE),],
{rank <- ave(assets, date, FUN = seq.int)})
Tidyverse 解决方案:
library(tidyverse)
df %>%
mutate(idx = row_number()) %>%
arrange(desc(assets)) %>%
group_by(date) %>%
mutate(rank = row_number()) %>%
ungroup() %>%
arrange(idx) %>%
select(-idx)
数据:
df <- structure(list(id = c("X1", "X2", "X3", "X4", "X3", "X1", "X10"),
assets = c(50L, 120L, 530L, 24L, 57L, 445L, 525L),
date = c("1994-03-31", "1994-03-31", "1994-03-31", "1994-03-31", "1994-06-30",
"1994-06-30", "1994-06-30")), class = "data.frame", row.names = c(NA, -7L))
我想在我的数据框中添加一个按日期排序的新列(在本例中是季度,因此可以按月进行)。公司应按资产排名 quarter/month.
每个季度的公司(id)数量不同,可能有新的进入,也有旧的消失。
我想从这里开始
# id assets date
# 1 X1 50 1994-03-31
# 2 X2 120 1994-03-31
# 3 X3 530 1994-03-31
# 4 X4 24 1994-03-31
# 6 X3 57 1994-06-30
# 7 X1 445 1994-06-30
# 8 X10 525 1994-06-30
至此
# id assets date rank
# 1 X1 50 1994-03-31 3
# 2 X2 120 1994-03-31 2
# 3 X3 530 1994-03-31 1
# 4 X4 24 1994-03-31 4
# 6 X3 57 1994-06-30 3
# 7 X1 445 1994-06-30 2
# 8 X10 525 1994-06-30 1
我试过:
temp_asset_rank <- temp_asset_rank %>%
mutate(yearx = year(date)) %>%
mutate(month = month(date)) %>%
group_by(yearx, month) %>%
mutate(ranking = rank(temp_asset_rank$assets, na.last = NA, ties.method = c("average"))) %>%
ungroup()
但是 returns:
Error: Column `ranking` must be length 11788 (the group size) or one, not 1188563
如您所见,我的数据集实际上更大并且包含更多列。
改变
group_by(yearx, month)
到
group_by(yearx) %>%
group_by(month)
也不行
你能帮帮我吗?
基础 R 解决方案:
within(df[order(df$assets, decreasing = TRUE),],
{rank <- ave(assets, date, FUN = seq.int)})
Tidyverse 解决方案:
library(tidyverse)
df %>%
mutate(idx = row_number()) %>%
arrange(desc(assets)) %>%
group_by(date) %>%
mutate(rank = row_number()) %>%
ungroup() %>%
arrange(idx) %>%
select(-idx)
数据:
df <- structure(list(id = c("X1", "X2", "X3", "X4", "X3", "X1", "X10"),
assets = c(50L, 120L, 530L, 24L, 57L, 445L, 525L),
date = c("1994-03-31", "1994-03-31", "1994-03-31", "1994-03-31", "1994-06-30",
"1994-06-30", "1994-06-30")), class = "data.frame", row.names = c(NA, -7L))