将日期变成序数变量

Question

我在 R 中处理日期，我想将日期转换为一个数字，该数字表示参与者通过测试所花费的尝试次数。一些参与者尝试了多次，而其他人只尝试了一次。此外，有些人比其他人早几年参加考试，所以我不在乎日期，只是第一次或第二次等

这是一个模拟数据集：

library(dplyr)
library(lubridate)
problem <- tibble(name = c("Britney", "Christina", "Justin", "Britney", "Britney", "Christina", "Christina", "Christina"),
                  score = c(1, 2, 3, 3, 3, 2, 4, 2),
                  date = ymd_hms(c("2019-02-26 00:18:09", "2019-04-26 00:18:09", "2019-02-20 00:18:09", "2018-02-26 00:18:09", "2017-02-26 00:18:09", "2016-02-26 00:18:09", "2015-02-26 00:18:09", "2010-02-26 00:18:09")))

这就是我希望它最终看起来的样子：

solution <- tibble(name = c("Britney", "Christina", "Justin", "Britney", "Britney", "Christina", "Christina", "Christina"),
                  score = c(1, 2, 3, 3, 3, 2, 4, 2),
                  date = ymd_hms(c("2019-02-26 00:18:09", "2019-04-26 00:18:09", "2019-02-20 00:18:09", "2018-02-26 00:18:09", "2017-02-26 00:18:09", "2016-02-26 00:18:09", "2015-02-26 00:18:09", "2010-02-26 00:18:09")),
                  order = c(3, 4, 1, 2, 1, 3, 2, 1))

solution

谢谢！

Answer 1

我们可以转换为 factor 并强制转换为 integer

library(dplyr)
problem %>% 
    group_by(name) %>% 
    mutate(n = as.integer(factor(date)))
# A tibble: 8 x 4
# Groups:   name [3]
#  name      score date                    n
#  <chr>     <dbl> <dttm>              <int>
#1 Britney       1 2019-02-26 00:18:09     3
#2 Christina     2 2019-04-26 00:18:09     4
#3 Justin        3 2019-02-20 00:18:09     1
#4 Britney       3 2018-02-26 00:18:09     2
#5 Britney       3 2017-02-26 00:18:09     1
#6 Christina     2 2016-02-26 00:18:09     3
#7 Christina     4 2015-02-26 00:18:09     2
#8 Christina     2 2010-02-26 00:18:09     1

或按 'name' 分组后，在 'date'

上应用 dense_rank

problem %>% 
    group_by(name) %>%
    mutate(n = dense_rank(date))
# A tibble: 8 x 4
# Groups:   name [3]
#  name      score date                    n
#  <chr>     <dbl> <dttm>              <int>
#1 Britney       1 2019-02-26 00:18:09     3
#2 Christina     2 2019-04-26 00:18:09     4
#3 Justin        3 2019-02-20 00:18:09     1
#4 Britney       3 2018-02-26 00:18:09     2
#5 Britney       3 2017-02-26 00:18:09     1
#6 Christina     2 2016-02-26 00:18:09     3
#7 Christina     4 2015-02-26 00:18:09     2
#8 Christina     2 2010-02-26 00:18:09     1

注意：两种解决方案都基于查看 'date' 变量。没有其他假设

Answer 2

您可以按名称分组并采用相反的顺序，即

library(dplyr)

problem %>% 
 group_by(name) %>% 
 mutate(order = rev(seq(n())))

这给出了，

# A tibble: 8 x 4
# Groups:   name [3]
  name      score date                order
  <chr>     <dbl> <dttm>              <int>
1 Britney       1 2019-02-26 00:18:09     3
2 Christina     2 2019-04-26 00:18:09     4
3 Justin        3 2019-02-20 00:18:09     1
4 Britney       3 2018-02-26 00:18:09     2
5 Britney       3 2017-02-26 00:18:09     1
6 Christina     2 2016-02-26 00:18:09     3
7 Christina     4 2015-02-26 00:18:09     2
8 Christina     2 2010-02-26 00:18:09     1

Answer 3

或group_byname将数据按name和date

排列后赋值row_number

library(dplyr)

problem %>%
  arrange(name, date) %>%
  group_by(name) %>%
  mutate(order = row_number())


# A tibble: 8 x 4
# Groups:   name [3]
#   name      score date                order
#   <chr>     <dbl> <dttm>              <int>
#1 Britney       3 2017-02-26 00:18:09     1
#2 Britney       3 2018-02-26 00:18:09     2
#3 Britney       1 2019-02-26 00:18:09     3
#4 Christina     2 2010-02-26 00:18:09     1
#5 Christina     4 2015-02-26 00:18:09     2
#6 Christina     2 2016-02-26 00:18:09     3
#7 Christina     2 2019-04-26 00:18:09     4
#8 Justin        3 2019-02-20 00:18:09     1

Answer 4

您可以使用 data.table

中的 rowid

library(data.table)
setDT(problem)

problem[order(date), order := rowid(name)]

或者您可以使用 frank 按名称对日期进行排名

problem[, order := frank(date), name]

任一方法的输出

problem
#         name score                date order
# 1:   Britney     1 2019-02-26 00:18:09     3
# 2: Christina     2 2019-04-26 00:18:09     4
# 3:    Justin     3 2019-02-20 00:18:09     1
# 4:   Britney     3 2018-02-26 00:18:09     2
# 5:   Britney     3 2017-02-26 00:18:09     1
# 6: Christina     2 2016-02-26 00:18:09     3
# 7: Christina     4 2015-02-26 00:18:09     2
# 8: Christina     2 2010-02-26 00:18:09     1

将日期变成序数变量

Make dates into ordinal variables

r

date

type-conversion

ordinal

lubridate