在 R 中,根据由第三列分组的第二列的顺序创建一个新列
In R, create a new column based on the order of a 2nd column grouped by a 3rd column
这与其他一些问题非常相似,但我对其他答案不太满意。
我有数据,其中一列是拉丁方研究设计的结果,其中参与者有三种情况,可能有六种可能的顺序。我没有一个变量来指示参与者实际收到研究条件的顺序,因此需要自己创建一个。这是我当前和期望的输出,使用前三名参与者的假例子:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
(current <- tibble(
participant = c(1,1,1,2,2,2,3,3,3),
block_code = c("timed", "untimed", "practice", "untimed", "practice", "timed", "timed", "untimed", "practice")
))
#> # A tibble: 9 × 2
#> participant block_code
#> <dbl> <chr>
#> 1 1 timed
#> 2 1 untimed
#> 3 1 practice
#> 4 2 untimed
#> 5 2 practice
#> 6 2 timed
#> 7 3 timed
#> 8 3 untimed
#> 9 3 practice
(desired <- current %>%
mutate(order_code = c(rep("tup", 3), rep("upt", 3), rep("tup", 3))))
#> # A tibble: 9 × 3
#> participant block_code order_code
#> <dbl> <chr> <chr>
#> 1 1 timed tup
#> 2 1 untimed tup
#> 3 1 practice tup
#> 4 2 untimed upt
#> 5 2 practice upt
#> 6 2 timed upt
#> 7 3 timed tup
#> 8 3 untimed tup
#> 9 3 practice tup
由 reprex package (v2.0.1)
于 2022-02-28 创建
参与者 1 和参与者 3 的顺序相同,因此他们最终得到相同的代码。
如何告诉 R 根据参与者中 block_code
变量的顺序创建新列?
您可以 group_by(participant)
,然后折叠每个 block_code
的首字母创建 order_code
:
library(tidyverse)
(current %>%
group_by(participant) %>%
mutate(order_code = str_c(str_sub(block_code, end = 1), collapse = "")) %>%
ungroup())
#> # A tibble: 9 x 3
#> participant block_code order_code
#> <dbl> <chr> <chr>
#> 1 1 timed tup
#> 2 1 untimed tup
#> 3 1 practice tup
#> 4 2 untimed upt
#> 5 2 practice upt
#> 6 2 timed upt
#> 7 3 timed tup
#> 8 3 untimed tup
#> 9 3 practice tup
由 reprex package (v2.0.1)
于 2022-02-28 创建
另一个略有不同的选项是使用 summarise
,这样您就可以删除分组而不必 ungroup
。在这里,我们按 participant
分组,然后仅将每个组的第一个字母折叠在一起。
library(tidyverse)
current %>%
group_by(participant) %>%
summarise(
block_code,
order_code = paste(substr(block_code, 0, 1), collapse = ""),
.groups = "drop"
)
输出
participant block_code order_code
<dbl> <chr> <chr>
1 1 timed tup
2 1 untimed tup
3 1 practice tup
4 2 untimed upt
5 2 practice upt
6 2 timed upt
7 3 timed tup
8 3 untimed tup
9 3 practice tup
或 data.table
:
library("data.table")
dt <- as.data.table(current)
dt[, order_code := paste(substr(block_code, 0, 1), collapse = ""), by = participant]
或以 R 为基数:
merge(current, setNames(
aggregate(
block_code ~ participant,
data = current,
FUN = \(x) paste(substr(x, 0, 1), collapse = "")
),
c("participant", "order_code")
), by = "participant")
这与其他一些问题非常相似,但我对其他答案不太满意。
我有数据,其中一列是拉丁方研究设计的结果,其中参与者有三种情况,可能有六种可能的顺序。我没有一个变量来指示参与者实际收到研究条件的顺序,因此需要自己创建一个。这是我当前和期望的输出,使用前三名参与者的假例子:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
(current <- tibble(
participant = c(1,1,1,2,2,2,3,3,3),
block_code = c("timed", "untimed", "practice", "untimed", "practice", "timed", "timed", "untimed", "practice")
))
#> # A tibble: 9 × 2
#> participant block_code
#> <dbl> <chr>
#> 1 1 timed
#> 2 1 untimed
#> 3 1 practice
#> 4 2 untimed
#> 5 2 practice
#> 6 2 timed
#> 7 3 timed
#> 8 3 untimed
#> 9 3 practice
(desired <- current %>%
mutate(order_code = c(rep("tup", 3), rep("upt", 3), rep("tup", 3))))
#> # A tibble: 9 × 3
#> participant block_code order_code
#> <dbl> <chr> <chr>
#> 1 1 timed tup
#> 2 1 untimed tup
#> 3 1 practice tup
#> 4 2 untimed upt
#> 5 2 practice upt
#> 6 2 timed upt
#> 7 3 timed tup
#> 8 3 untimed tup
#> 9 3 practice tup
由 reprex package (v2.0.1)
于 2022-02-28 创建参与者 1 和参与者 3 的顺序相同,因此他们最终得到相同的代码。
如何告诉 R 根据参与者中 block_code
变量的顺序创建新列?
您可以 group_by(participant)
,然后折叠每个 block_code
的首字母创建 order_code
:
library(tidyverse)
(current %>%
group_by(participant) %>%
mutate(order_code = str_c(str_sub(block_code, end = 1), collapse = "")) %>%
ungroup())
#> # A tibble: 9 x 3
#> participant block_code order_code
#> <dbl> <chr> <chr>
#> 1 1 timed tup
#> 2 1 untimed tup
#> 3 1 practice tup
#> 4 2 untimed upt
#> 5 2 practice upt
#> 6 2 timed upt
#> 7 3 timed tup
#> 8 3 untimed tup
#> 9 3 practice tup
由 reprex package (v2.0.1)
于 2022-02-28 创建另一个略有不同的选项是使用 summarise
,这样您就可以删除分组而不必 ungroup
。在这里,我们按 participant
分组,然后仅将每个组的第一个字母折叠在一起。
library(tidyverse)
current %>%
group_by(participant) %>%
summarise(
block_code,
order_code = paste(substr(block_code, 0, 1), collapse = ""),
.groups = "drop"
)
输出
participant block_code order_code
<dbl> <chr> <chr>
1 1 timed tup
2 1 untimed tup
3 1 practice tup
4 2 untimed upt
5 2 practice upt
6 2 timed upt
7 3 timed tup
8 3 untimed tup
9 3 practice tup
或 data.table
:
library("data.table")
dt <- as.data.table(current)
dt[, order_code := paste(substr(block_code, 0, 1), collapse = ""), by = participant]
或以 R 为基数:
merge(current, setNames(
aggregate(
block_code ~ participant,
data = current,
FUN = \(x) paste(substr(x, 0, 1), collapse = "")
),
c("participant", "order_code")
), by = "participant")