选择一年中的一个月进行排名,然后将结果排名赋予其余年份
Choose a month of a year to rank then give resulting ranks to the rest years
示例数据:
df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
year=c(2014,2014,2015,2015),
month=c(1,2),
new.employee=c(4,6,2,6,23,2,5,34))
id year month new.employee
1 A 2014 1 4
2 A 2014 2 6
3 A 2015 1 2
4 A 2015 2 6
5 B 2014 1 23
6 B 2014 2 2
7 B 2015 1 5
8 B 2015 2 34
期望的结果:
desired_df <- data.frame(id=c("A","A","A","A","B","B","B","B"),
year=c(2014,2014,2015,2015),
month=c(1,2),
new.employee=c(4,6,2,6,23,2,5,34),
new.employee.rank=c(1,1,2,2,2,2,1,1))
id year month new.employee new.employee.rank
1 A 2014 1 4 1
2 A 2014 2 6 1
3 A 2015 1 2 2
4 A 2015 2 6 2
5 B 2014 1 23 2
6 B 2014 2 2 2
7 B 2015 1 5 1
8 B 2015 2 34 1
排名规则是:我选择每年的第2个月对A和B之间的新员工数量进行排名。然后我需要将这些排名给第1个月。即每年第1个月的排名必须相等到同年第2个月的排名。
我试过这些代码来获得每个月和每年的排名,
library(data.table)
df1 <- data.table(df1)
df1[,rank:=rank(new.employee), by=c("year","month")]
如果(任何人都可以滚动列中的排名值以用第 2 个月的排名替换第 1 个月的排名),这可能是一个解决方案。
这是一个基于dplyr
的解决方案。这个想法是将数据减少到你想要比较的部分,进行比较,然后将结果加入到原始数据集中,扩展它以填充所有相关槽。请注意对用于创建示例数据的代码所做的编辑。
df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
year=rep(c(2014,2014,2015,2015), 2),
month=rep(c(1,2), 4),
new.employee=c(4,6,2,6,23,2,5,34))
library(dplyr)
df1 %>%
# Reduce the data to the slices (months) you want to compare
filter(month==2) %>%
# Group the data by year, so the comparisons are within and not across years
group_by(year) %>%
# Create a variable that indicates the rankings within years in descending order
mutate(rank = rank(-new.employee)) %>%
# To prepare for merging, reduce the new data to just that ranking var plus id and year
select(id, year, rank) %>%
# Use left_join to merge the new data (.) with the original df, expanding the
# new data to fill all rows with id-year matches
left_join(df1, .) %>%
# Order the data by id, year, and month to make it easier to review
arrange(id, year, month)
输出:
Joining by: c("id", "year")
id year month new.employee rank
1 A 2014 1 4 1
2 A 2014 2 6 1
3 A 2015 1 2 2
4 A 2015 2 6 2
5 B 2014 1 23 2
6 B 2014 2 2 2
7 B 2015 1 5 1
8 B 2015 2 34 1
您已经尝试了 data.table
解决方案,下面是我如何使用 data.table
来解决这个问题
library(data.table) # V1.9.6+
temp <- setDT(df1)[month == 2L, .(id, frank(-new.employee)), by = year]
df1[temp, new.employee.rank := i.V2, on = c("year", "id")]
df1
# id year month new.employee new.employee.rank
# 1: A 2014 1 4 1
# 2: A 2014 2 6 1
# 3: A 2015 1 2 2
# 4: A 2015 2 6 2
# 5: B 2014 1 23 2
# 6: B 2014 2 2 2
# 7: B 2015 1 5 1
# 8: B 2015 2 34 1
它看起来有点类似于上面的 dplyr
解决方案。这基本上是每年对 id
进行排名,并将它们加入原始数据集。我在这里使用 data.table
V1.9.6+。
示例数据:
df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
year=c(2014,2014,2015,2015),
month=c(1,2),
new.employee=c(4,6,2,6,23,2,5,34))
id year month new.employee
1 A 2014 1 4
2 A 2014 2 6
3 A 2015 1 2
4 A 2015 2 6
5 B 2014 1 23
6 B 2014 2 2
7 B 2015 1 5
8 B 2015 2 34
期望的结果:
desired_df <- data.frame(id=c("A","A","A","A","B","B","B","B"),
year=c(2014,2014,2015,2015),
month=c(1,2),
new.employee=c(4,6,2,6,23,2,5,34),
new.employee.rank=c(1,1,2,2,2,2,1,1))
id year month new.employee new.employee.rank
1 A 2014 1 4 1
2 A 2014 2 6 1
3 A 2015 1 2 2
4 A 2015 2 6 2
5 B 2014 1 23 2
6 B 2014 2 2 2
7 B 2015 1 5 1
8 B 2015 2 34 1
排名规则是:我选择每年的第2个月对A和B之间的新员工数量进行排名。然后我需要将这些排名给第1个月。即每年第1个月的排名必须相等到同年第2个月的排名。
我试过这些代码来获得每个月和每年的排名,
library(data.table)
df1 <- data.table(df1)
df1[,rank:=rank(new.employee), by=c("year","month")]
如果(任何人都可以滚动列中的排名值以用第 2 个月的排名替换第 1 个月的排名),这可能是一个解决方案。
这是一个基于dplyr
的解决方案。这个想法是将数据减少到你想要比较的部分,进行比较,然后将结果加入到原始数据集中,扩展它以填充所有相关槽。请注意对用于创建示例数据的代码所做的编辑。
df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
year=rep(c(2014,2014,2015,2015), 2),
month=rep(c(1,2), 4),
new.employee=c(4,6,2,6,23,2,5,34))
library(dplyr)
df1 %>%
# Reduce the data to the slices (months) you want to compare
filter(month==2) %>%
# Group the data by year, so the comparisons are within and not across years
group_by(year) %>%
# Create a variable that indicates the rankings within years in descending order
mutate(rank = rank(-new.employee)) %>%
# To prepare for merging, reduce the new data to just that ranking var plus id and year
select(id, year, rank) %>%
# Use left_join to merge the new data (.) with the original df, expanding the
# new data to fill all rows with id-year matches
left_join(df1, .) %>%
# Order the data by id, year, and month to make it easier to review
arrange(id, year, month)
输出:
Joining by: c("id", "year")
id year month new.employee rank
1 A 2014 1 4 1
2 A 2014 2 6 1
3 A 2015 1 2 2
4 A 2015 2 6 2
5 B 2014 1 23 2
6 B 2014 2 2 2
7 B 2015 1 5 1
8 B 2015 2 34 1
您已经尝试了 data.table
解决方案,下面是我如何使用 data.table
library(data.table) # V1.9.6+
temp <- setDT(df1)[month == 2L, .(id, frank(-new.employee)), by = year]
df1[temp, new.employee.rank := i.V2, on = c("year", "id")]
df1
# id year month new.employee new.employee.rank
# 1: A 2014 1 4 1
# 2: A 2014 2 6 1
# 3: A 2015 1 2 2
# 4: A 2015 2 6 2
# 5: B 2014 1 23 2
# 6: B 2014 2 2 2
# 7: B 2015 1 5 1
# 8: B 2015 2 34 1
它看起来有点类似于上面的 dplyr
解决方案。这基本上是每年对 id
进行排名,并将它们加入原始数据集。我在这里使用 data.table
V1.9.6+。