r 将具有模式的列名称的数据集从宽转换为长
r transform dataset from wide to long for column names with patterns
让我们假设一个像下面这样的数据集。
ID Gender ColA_1 ColB_1 ColC_1__1 ColC_1__2 ColA_2 ColB_2 ColC_2__1 ColC_2__2
1 Male No Yes Yes No No Yes No No
2 Female Yes No Yes No No Yes Yes No
我喜欢做的是像下面这样转换这个数据集
ID Index Gender ColA ColB ColC_1 ColC_2
1 1 Male No Yes Yes No
1 2 Male No Yes No No
2 1 Female Yes No Yes No
2 2 Female No Yes Yes No
不知道该怎么做,需要帮助。提前致谢。
这里有一个选项pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
rename_at(vars(starts_with('ColC')), ~ str_replace(., "_(\d+)__(\d+)",
"\2_\1")) %>%
pivot_longer(cols = 3:ncol(.), names_to = c(".value", "Index"),
names_sep="_")
# A tibble: 4 x 7
# ID Gender Index ColA ColB ColC1 ColC2
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 Male 1 No Yes Yes No
#2 1 Male 2 No Yes No No
#3 2 Female 1 Yes No Yes No
#4 2 Female 2 No Yes Yes No
数据
df1 <- structure(list(ID = 1:2, Gender = c("Male", "Female"), ColA_1 = c("No",
"Yes"), ColB_1 = c("Yes", "No"), ColC_1__1 = c("Yes", "Yes"),
ColC_1__2 = c("No", "No"), ColA_2 = c("No", "No"), ColB_2 = c("Yes",
"Yes"), ColC_2__1 = c("No", "Yes"), ColC_2__2 = c("No", "No"
)), class = "data.frame", row.names = c(NA, -2L))
您可以使用 data.table
软件包解决您的问题:
library(data.table)
melt(data = setDT(df),
measure = patterns(ColA="ColA", ColB="ColB", ColC_1="ColC_.+1$", ColC_2="ColC_.+2$"),
variable.name = "index")
让我们假设一个像下面这样的数据集。
ID Gender ColA_1 ColB_1 ColC_1__1 ColC_1__2 ColA_2 ColB_2 ColC_2__1 ColC_2__2
1 Male No Yes Yes No No Yes No No
2 Female Yes No Yes No No Yes Yes No
我喜欢做的是像下面这样转换这个数据集
ID Index Gender ColA ColB ColC_1 ColC_2
1 1 Male No Yes Yes No
1 2 Male No Yes No No
2 1 Female Yes No Yes No
2 2 Female No Yes Yes No
不知道该怎么做,需要帮助。提前致谢。
这里有一个选项pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
rename_at(vars(starts_with('ColC')), ~ str_replace(., "_(\d+)__(\d+)",
"\2_\1")) %>%
pivot_longer(cols = 3:ncol(.), names_to = c(".value", "Index"),
names_sep="_")
# A tibble: 4 x 7
# ID Gender Index ColA ColB ColC1 ColC2
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 Male 1 No Yes Yes No
#2 1 Male 2 No Yes No No
#3 2 Female 1 Yes No Yes No
#4 2 Female 2 No Yes Yes No
数据
df1 <- structure(list(ID = 1:2, Gender = c("Male", "Female"), ColA_1 = c("No",
"Yes"), ColB_1 = c("Yes", "No"), ColC_1__1 = c("Yes", "Yes"),
ColC_1__2 = c("No", "No"), ColA_2 = c("No", "No"), ColB_2 = c("Yes",
"Yes"), ColC_2__1 = c("No", "Yes"), ColC_2__2 = c("No", "No"
)), class = "data.frame", row.names = c(NA, -2L))
您可以使用 data.table
软件包解决您的问题:
library(data.table)
melt(data = setDT(df),
measure = patterns(ColA="ColA", ColB="ColB", ColC_1="ColC_.+1$", ColC_2="ColC_.+2$"),
variable.name = "index")