将日期变成序数变量
Make dates into ordinal variables
我在 R 中处理日期,我想将日期转换为一个数字,该数字表示参与者通过测试所花费的尝试次数。一些参与者尝试了多次,而其他人只尝试了一次。此外,有些人比其他人早几年参加考试,所以我不在乎日期,只是第一次或第二次等
这是一个模拟数据集:
library(dplyr)
library(lubridate)
problem <- tibble(name = c("Britney", "Christina", "Justin", "Britney", "Britney", "Christina", "Christina", "Christina"),
score = c(1, 2, 3, 3, 3, 2, 4, 2),
date = ymd_hms(c("2019-02-26 00:18:09", "2019-04-26 00:18:09", "2019-02-20 00:18:09", "2018-02-26 00:18:09", "2017-02-26 00:18:09", "2016-02-26 00:18:09", "2015-02-26 00:18:09", "2010-02-26 00:18:09")))
这就是我希望它最终看起来的样子:
solution <- tibble(name = c("Britney", "Christina", "Justin", "Britney", "Britney", "Christina", "Christina", "Christina"),
score = c(1, 2, 3, 3, 3, 2, 4, 2),
date = ymd_hms(c("2019-02-26 00:18:09", "2019-04-26 00:18:09", "2019-02-20 00:18:09", "2018-02-26 00:18:09", "2017-02-26 00:18:09", "2016-02-26 00:18:09", "2015-02-26 00:18:09", "2010-02-26 00:18:09")),
order = c(3, 4, 1, 2, 1, 3, 2, 1))
solution
谢谢!
我们可以转换为 factor
并强制转换为 integer
library(dplyr)
problem %>%
group_by(name) %>%
mutate(n = as.integer(factor(date)))
# A tibble: 8 x 4
# Groups: name [3]
# name score date n
# <chr> <dbl> <dttm> <int>
#1 Britney 1 2019-02-26 00:18:09 3
#2 Christina 2 2019-04-26 00:18:09 4
#3 Justin 3 2019-02-20 00:18:09 1
#4 Britney 3 2018-02-26 00:18:09 2
#5 Britney 3 2017-02-26 00:18:09 1
#6 Christina 2 2016-02-26 00:18:09 3
#7 Christina 4 2015-02-26 00:18:09 2
#8 Christina 2 2010-02-26 00:18:09 1
或按 'name' 分组后,在 'date'
上应用 dense_rank
problem %>%
group_by(name) %>%
mutate(n = dense_rank(date))
# A tibble: 8 x 4
# Groups: name [3]
# name score date n
# <chr> <dbl> <dttm> <int>
#1 Britney 1 2019-02-26 00:18:09 3
#2 Christina 2 2019-04-26 00:18:09 4
#3 Justin 3 2019-02-20 00:18:09 1
#4 Britney 3 2018-02-26 00:18:09 2
#5 Britney 3 2017-02-26 00:18:09 1
#6 Christina 2 2016-02-26 00:18:09 3
#7 Christina 4 2015-02-26 00:18:09 2
#8 Christina 2 2010-02-26 00:18:09 1
注意:两种解决方案都基于查看 'date' 变量。没有其他假设
您可以按名称分组并采用相反的顺序,即
library(dplyr)
problem %>%
group_by(name) %>%
mutate(order = rev(seq(n())))
这给出了,
# A tibble: 8 x 4
# Groups: name [3]
name score date order
<chr> <dbl> <dttm> <int>
1 Britney 1 2019-02-26 00:18:09 3
2 Christina 2 2019-04-26 00:18:09 4
3 Justin 3 2019-02-20 00:18:09 1
4 Britney 3 2018-02-26 00:18:09 2
5 Britney 3 2017-02-26 00:18:09 1
6 Christina 2 2016-02-26 00:18:09 3
7 Christina 4 2015-02-26 00:18:09 2
8 Christina 2 2010-02-26 00:18:09 1
或group_by
name
将数据按name
和date
排列后赋值row_number
library(dplyr)
problem %>%
arrange(name, date) %>%
group_by(name) %>%
mutate(order = row_number())
# A tibble: 8 x 4
# Groups: name [3]
# name score date order
# <chr> <dbl> <dttm> <int>
#1 Britney 3 2017-02-26 00:18:09 1
#2 Britney 3 2018-02-26 00:18:09 2
#3 Britney 1 2019-02-26 00:18:09 3
#4 Christina 2 2010-02-26 00:18:09 1
#5 Christina 4 2015-02-26 00:18:09 2
#6 Christina 2 2016-02-26 00:18:09 3
#7 Christina 2 2019-04-26 00:18:09 4
#8 Justin 3 2019-02-20 00:18:09 1
您可以使用 data.table
中的 rowid
library(data.table)
setDT(problem)
problem[order(date), order := rowid(name)]
或者您可以使用 frank
按名称对日期进行排名
problem[, order := frank(date), name]
任一方法的输出
problem
# name score date order
# 1: Britney 1 2019-02-26 00:18:09 3
# 2: Christina 2 2019-04-26 00:18:09 4
# 3: Justin 3 2019-02-20 00:18:09 1
# 4: Britney 3 2018-02-26 00:18:09 2
# 5: Britney 3 2017-02-26 00:18:09 1
# 6: Christina 2 2016-02-26 00:18:09 3
# 7: Christina 4 2015-02-26 00:18:09 2
# 8: Christina 2 2010-02-26 00:18:09 1
我在 R 中处理日期,我想将日期转换为一个数字,该数字表示参与者通过测试所花费的尝试次数。一些参与者尝试了多次,而其他人只尝试了一次。此外,有些人比其他人早几年参加考试,所以我不在乎日期,只是第一次或第二次等
这是一个模拟数据集:
library(dplyr)
library(lubridate)
problem <- tibble(name = c("Britney", "Christina", "Justin", "Britney", "Britney", "Christina", "Christina", "Christina"),
score = c(1, 2, 3, 3, 3, 2, 4, 2),
date = ymd_hms(c("2019-02-26 00:18:09", "2019-04-26 00:18:09", "2019-02-20 00:18:09", "2018-02-26 00:18:09", "2017-02-26 00:18:09", "2016-02-26 00:18:09", "2015-02-26 00:18:09", "2010-02-26 00:18:09")))
这就是我希望它最终看起来的样子:
solution <- tibble(name = c("Britney", "Christina", "Justin", "Britney", "Britney", "Christina", "Christina", "Christina"),
score = c(1, 2, 3, 3, 3, 2, 4, 2),
date = ymd_hms(c("2019-02-26 00:18:09", "2019-04-26 00:18:09", "2019-02-20 00:18:09", "2018-02-26 00:18:09", "2017-02-26 00:18:09", "2016-02-26 00:18:09", "2015-02-26 00:18:09", "2010-02-26 00:18:09")),
order = c(3, 4, 1, 2, 1, 3, 2, 1))
solution
谢谢!
我们可以转换为 factor
并强制转换为 integer
library(dplyr)
problem %>%
group_by(name) %>%
mutate(n = as.integer(factor(date)))
# A tibble: 8 x 4
# Groups: name [3]
# name score date n
# <chr> <dbl> <dttm> <int>
#1 Britney 1 2019-02-26 00:18:09 3
#2 Christina 2 2019-04-26 00:18:09 4
#3 Justin 3 2019-02-20 00:18:09 1
#4 Britney 3 2018-02-26 00:18:09 2
#5 Britney 3 2017-02-26 00:18:09 1
#6 Christina 2 2016-02-26 00:18:09 3
#7 Christina 4 2015-02-26 00:18:09 2
#8 Christina 2 2010-02-26 00:18:09 1
或按 'name' 分组后,在 'date'
上应用dense_rank
problem %>%
group_by(name) %>%
mutate(n = dense_rank(date))
# A tibble: 8 x 4
# Groups: name [3]
# name score date n
# <chr> <dbl> <dttm> <int>
#1 Britney 1 2019-02-26 00:18:09 3
#2 Christina 2 2019-04-26 00:18:09 4
#3 Justin 3 2019-02-20 00:18:09 1
#4 Britney 3 2018-02-26 00:18:09 2
#5 Britney 3 2017-02-26 00:18:09 1
#6 Christina 2 2016-02-26 00:18:09 3
#7 Christina 4 2015-02-26 00:18:09 2
#8 Christina 2 2010-02-26 00:18:09 1
注意:两种解决方案都基于查看 'date' 变量。没有其他假设
您可以按名称分组并采用相反的顺序,即
library(dplyr)
problem %>%
group_by(name) %>%
mutate(order = rev(seq(n())))
这给出了,
# A tibble: 8 x 4 # Groups: name [3] name score date order <chr> <dbl> <dttm> <int> 1 Britney 1 2019-02-26 00:18:09 3 2 Christina 2 2019-04-26 00:18:09 4 3 Justin 3 2019-02-20 00:18:09 1 4 Britney 3 2018-02-26 00:18:09 2 5 Britney 3 2017-02-26 00:18:09 1 6 Christina 2 2016-02-26 00:18:09 3 7 Christina 4 2015-02-26 00:18:09 2 8 Christina 2 2010-02-26 00:18:09 1
或group_by
name
将数据按name
和date
row_number
library(dplyr)
problem %>%
arrange(name, date) %>%
group_by(name) %>%
mutate(order = row_number())
# A tibble: 8 x 4
# Groups: name [3]
# name score date order
# <chr> <dbl> <dttm> <int>
#1 Britney 3 2017-02-26 00:18:09 1
#2 Britney 3 2018-02-26 00:18:09 2
#3 Britney 1 2019-02-26 00:18:09 3
#4 Christina 2 2010-02-26 00:18:09 1
#5 Christina 4 2015-02-26 00:18:09 2
#6 Christina 2 2016-02-26 00:18:09 3
#7 Christina 2 2019-04-26 00:18:09 4
#8 Justin 3 2019-02-20 00:18:09 1
您可以使用 data.table
中的rowid
library(data.table)
setDT(problem)
problem[order(date), order := rowid(name)]
或者您可以使用 frank
按名称对日期进行排名
problem[, order := frank(date), name]
任一方法的输出
problem
# name score date order
# 1: Britney 1 2019-02-26 00:18:09 3
# 2: Christina 2 2019-04-26 00:18:09 4
# 3: Justin 3 2019-02-20 00:18:09 1
# 4: Britney 3 2018-02-26 00:18:09 2
# 5: Britney 3 2017-02-26 00:18:09 1
# 6: Christina 2 2016-02-26 00:18:09 3
# 7: Christina 4 2015-02-26 00:18:09 2
# 8: Christina 2 2010-02-26 00:18:09 1