创建具有大小写匹配的分类 variable/data 子集
Create categorical variable/data subset with case matching
我有这样的数据集:
structure(list(year = c(2019, 2019, 2019, 2019, 2019, 2019),
venue = c("Z", "Z", "Z", "Z", "O", "D"), HO = c("X", "Y",
"X", "Y", "W", "J"), AW = c("Y", "X", "W", "T", "T", "X")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
我需要创建一个returns“是”或“否”的分类变量。
structure(list(year = c(2019, 2019, 2019, 2019, 2019, 2019),
venue = c("Z", "Z", "Z", "Z", "O", "D"), HO = c("X", "Y",
"X", "Y", "W", "J"), AW = c("Y", "X", "W", "T", "T", "X"),
Cat = c("yes", "yes", "no", "no", "no", "no")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
请注意,前两行是“是”的情况。这些在属性“year”和“venue”中是相同的,无论“HO”和“AW”中的顺序如何。
谢谢。
您可以先使用 rowwise()
按行对数据框进行分组,然后 paste
HO
和 AW
列组合在一起。这一步是看哪些行有相同的HO
和AW
,不分顺序。然后group_by
独特的组合对它们进行分类。
library(dplyr)
library(stringr)
df %>%
rowwise() %>%
mutate(paste_col = paste0(sort(str_split(paste0(HO, AW), "", simplify = T)), collapse = ",")) %>%
group_by(year, venue, paste_col) %>%
mutate(Cat = ifelse(n() > 1, "Yes", "No")) %>%
ungroup() %>%
select(-paste_col)
# A tibble: 6 × 5
year venue HO AW Cat
<dbl> <chr> <chr> <chr> <chr>
1 2019 Z X Y Yes
2 2019 Z Y X Yes
3 2019 Z X W No
4 2019 Z Y T No
5 2019 O W T No
6 2019 D J X No
我有这样的数据集:
structure(list(year = c(2019, 2019, 2019, 2019, 2019, 2019),
venue = c("Z", "Z", "Z", "Z", "O", "D"), HO = c("X", "Y",
"X", "Y", "W", "J"), AW = c("Y", "X", "W", "T", "T", "X")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
我需要创建一个returns“是”或“否”的分类变量。
structure(list(year = c(2019, 2019, 2019, 2019, 2019, 2019),
venue = c("Z", "Z", "Z", "Z", "O", "D"), HO = c("X", "Y",
"X", "Y", "W", "J"), AW = c("Y", "X", "W", "T", "T", "X"),
Cat = c("yes", "yes", "no", "no", "no", "no")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
请注意,前两行是“是”的情况。这些在属性“year”和“venue”中是相同的,无论“HO”和“AW”中的顺序如何。
谢谢。
您可以先使用 rowwise()
按行对数据框进行分组,然后 paste
HO
和 AW
列组合在一起。这一步是看哪些行有相同的HO
和AW
,不分顺序。然后group_by
独特的组合对它们进行分类。
library(dplyr)
library(stringr)
df %>%
rowwise() %>%
mutate(paste_col = paste0(sort(str_split(paste0(HO, AW), "", simplify = T)), collapse = ",")) %>%
group_by(year, venue, paste_col) %>%
mutate(Cat = ifelse(n() > 1, "Yes", "No")) %>%
ungroup() %>%
select(-paste_col)
# A tibble: 6 × 5
year venue HO AW Cat
<dbl> <chr> <chr> <chr> <chr>
1 2019 Z X Y Yes
2 2019 Z Y X Yes
3 2019 Z X W No
4 2019 Z Y T No
5 2019 O W T No
6 2019 D J X No