选择一年中的一个月进行排名，然后将结果排名赋予其余年份

Question

示例数据：

df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
                        year=c(2014,2014,2015,2015),
                        month=c(1,2),
                        new.employee=c(4,6,2,6,23,2,5,34))

  id year month new.employee
1  A 2014     1            4
2  A 2014     2            6
3  A 2015     1            2
4  A 2015     2            6
5  B 2014     1           23
6  B 2014     2            2
7  B 2015     1            5
8  B 2015     2           34

期望的结果：

desired_df <- data.frame(id=c("A","A","A","A","B","B","B","B"),
                        year=c(2014,2014,2015,2015),
                        month=c(1,2),
                        new.employee=c(4,6,2,6,23,2,5,34),
                        new.employee.rank=c(1,1,2,2,2,2,1,1))

  id year month new.employee new.employee.rank
1  A 2014     1            4                 1
2  A 2014     2            6                 1
3  A 2015     1            2                 2
4  A 2015     2            6                 2
5  B 2014     1           23                 2
6  B 2014     2            2                 2
7  B 2015     1            5                 1
8  B 2015     2           34                 1

排名规则是：我选择每年的第2个月对A和B之间的新员工数量进行排名。然后我需要将这些排名给第1个月。即每年第1个月的排名必须相等到同年第2个月的排名。

我试过这些代码来获得每个月和每年的排名，

library(data.table)
df1 <- data.table(df1)
df1[,rank:=rank(new.employee), by=c("year","month")]

如果（任何人都可以滚动列中的排名值以用第 2 个月的排名替换第 1 个月的排名），这可能是一个解决方案。

Answer 1

这是一个基于dplyr的解决方案。这个想法是将数据减少到你想要比较的部分，进行比较，然后将结果加入到原始数据集中，扩展它以填充所有相关槽。请注意对用于创建示例数据的代码所做的编辑。

df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
                        year=rep(c(2014,2014,2015,2015), 2),
                        month=rep(c(1,2), 4),
                        new.employee=c(4,6,2,6,23,2,5,34))

library(dplyr)

df1 %>%
  # Reduce the data to the slices (months) you want to compare
  filter(month==2) %>%
  # Group the data by year, so the comparisons are within and not across years
  group_by(year) %>%
  # Create a variable that indicates the rankings within years in descending order
  mutate(rank = rank(-new.employee)) %>%
  # To prepare for merging, reduce the new data to just that ranking var plus id and year
  select(id, year, rank) %>%
  # Use left_join to merge the new data (.) with the original df, expanding the
  # new data to fill all rows with id-year matches
  left_join(df1, .) %>%
  # Order the data by id, year, and month to make it easier to review
  arrange(id, year, month)

输出：

Joining by: c("id", "year")
  id year month new.employee rank
1  A 2014     1            4    1
2  A 2014     2            6    1
3  A 2015     1            2    2
4  A 2015     2            6    2
5  B 2014     1           23    2
6  B 2014     2            2    2
7  B 2015     1            5    1
8  B 2015     2           34    1

Answer 2

您已经尝试了 data.table 解决方案，下面是我如何使用 data.table

来解决这个问题

library(data.table) # V1.9.6+
temp <- setDT(df1)[month == 2L, .(id, frank(-new.employee)), by = year]
df1[temp, new.employee.rank := i.V2, on = c("year", "id")]
df1
#    id year month new.employee new.employee.rank
# 1:  A 2014     1            4                 1
# 2:  A 2014     2            6                 1
# 3:  A 2015     1            2                 2
# 4:  A 2015     2            6                 2
# 5:  B 2014     1           23                 2
# 6:  B 2014     2            2                 2
# 7:  B 2015     1            5                 1
# 8:  B 2015     2           34                 1

它看起来有点类似于上面的 dplyr 解决方案。这基本上是每年对 id 进行排名，并将它们加入原始数据集。我在这里使用 data.table V1.9.6+。

选择一年中的一个月进行排名，然后将结果排名赋予其余年份

Choose a month of a year to rank then give resulting ranks to the rest years

r

rank

conditional-statements

data.table