R,在更改列名时从宽到长
R, pivot wide to long while changing column names
我有这样的数据:
df<-structure(list(fname = c("Linda", "Bob"), employee_number = c("00000123456",
"654321"), job_role = c("Dept Research Admin", "Research Regulatory Assistant"
), ActiveAccount = c("Yes", "Yes"), CanAccess = c("No", "No"),
oncore_roles___1 = c(1, 0), oncore_roles___2 = c(1, 0), oncore_roles___3 = c(1,
0), oncore_roles___4 = c(0, 0), oncore_roles___5 = c(0, 1
), oncore_roles___6 = c(0, 0), oncore_roles___7 = c(0, 1),
oncore_roles___8 = c(0, 0), oncore_roles___9 = c(0, 0), oncore_roles___10 = c(0,
0), oncore_roles___11 = c(0, 0), oncore_roles___12 = c(0,
1), oncore_roles___13 = c(0, 0), oncore_roles___14 = c(0,
0), oncore_roles___15 = c(0, 0), oncore_roles___16 = c(0,
0), oncore_roles___17 = c(0, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
以“oncore roles”开头的列均来自此多项选择调查选项:
其中 oncore_roles_1 代表“calendar build”,oncore_roles_5 代表“principal investigator”,等等...
IE。如果 Bob 在 Oncore_roles_5 中标记为“1”,那么他就是首席研究员,如果他在其他所有“oncore_roles”列中都标记为零……他不是那些东西。
我需要调整我的数据以使其更长,并且只有一列用于“Oncore Roles”,其中包含说明该人所扮演角色的文本,每个角色对应一行。因此,如果 Bob 有三个角色,他将得到三个几乎相同的行。除了 oncore_roles 变量外,所有内容都是相同的。
我知道这可能是 pivot_longer 的某个版本,但诀窍(我问的原因)是我需要删除所有零。 IE。对于这个特定的数据,我会留下这个:
谢谢!
这是一个选项,我们根据多项选择题和列名创建 key/value 数据集,然后将重塑后的数据连接到 return 映射的列
library(dplyr)
library(tidyr)
library(stringr)
keydat <- tibble(name = str_c("oncore_roles___", 1:12),
Oncore_role = c("Calendar Build", "Protocol Management",
"Subject Managment", "Financials", "Principal Investigator",
"Protocol Management Finance", "Regulatory",
"Investigational Pharmacist", "Division Director", "CTO Signoff",
"Roles Administration", "Statistical Analysis"))
df %>%
pivot_longer(cols = starts_with('oncore_roles')) %>%
filter(value == 1) %>%
inner_join(keydat) %>%
select(-name)
-输出
# A tibble: 6 × 7
fname employee_number job_role ActiveAccount CanAccess value Oncore_role
<chr> <chr> <chr> <chr> <chr> <dbl> <chr>
1 Linda 00000123456 Dept Research Admin Yes No 1 Calendar Build
2 Linda 00000123456 Dept Research Admin Yes No 1 Protocol Management
3 Linda 00000123456 Dept Research Admin Yes No 1 Subject Managment
4 Bob 654321 Research Regulatory Assistant Yes No 1 Principal Investigator
5 Bob 654321 Research Regulatory Assistant Yes No 1 Regulatory
6 Bob 654321 Research Regulatory Assistant Yes No 1 Statistical Analysis
如果您构建一个小的 table oncore 角色查找,比如 roles
,您可以执行以下操作:
df %>%
pivot_longer(cols = -(fname:CanAccess),names_prefix = "oncore_roles___",names_to = "id") %>%
filter(value==1) %>%
mutate(id=as.numeric(id)) %>%
left_join(roles, by="id") %>%
select(-(id:value))
输出(注意我的 roles
只有前 5 个角色,但你可以让它更长,然后你可以使用 inner_join()
,而不是 left_join()
:
fname employee_number job_role ActiveAccount CanAccess Oncore_role
<chr> <chr> <chr> <chr> <chr> <chr>
1 Linda 00000123456 Dept Research Admin Yes No Calendar Build
2 Linda 00000123456 Dept Research Admin Yes No Protocol Management
3 Linda 00000123456 Dept Research Admin Yes No Subject Management
4 Bob 654321 Research Regulatory Assistant Yes No Principal Investigator
5 Bob 654321 Research Regulatory Assistant Yes No NA
6 Bob 654321 Research Regulatory Assistant Yes No NA
roles
:
roles =tibble(
id = 1:5,
Oncore_role = c(
"Calendar Build",
"Protocol Management",
"Subject Management",
"Financial",
"Principal Investigator"
))
我有这样的数据:
df<-structure(list(fname = c("Linda", "Bob"), employee_number = c("00000123456",
"654321"), job_role = c("Dept Research Admin", "Research Regulatory Assistant"
), ActiveAccount = c("Yes", "Yes"), CanAccess = c("No", "No"),
oncore_roles___1 = c(1, 0), oncore_roles___2 = c(1, 0), oncore_roles___3 = c(1,
0), oncore_roles___4 = c(0, 0), oncore_roles___5 = c(0, 1
), oncore_roles___6 = c(0, 0), oncore_roles___7 = c(0, 1),
oncore_roles___8 = c(0, 0), oncore_roles___9 = c(0, 0), oncore_roles___10 = c(0,
0), oncore_roles___11 = c(0, 0), oncore_roles___12 = c(0,
1), oncore_roles___13 = c(0, 0), oncore_roles___14 = c(0,
0), oncore_roles___15 = c(0, 0), oncore_roles___16 = c(0,
0), oncore_roles___17 = c(0, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
以“oncore roles”开头的列均来自此多项选择调查选项:
其中 oncore_roles_1 代表“calendar build”,oncore_roles_5 代表“principal investigator”,等等... IE。如果 Bob 在 Oncore_roles_5 中标记为“1”,那么他就是首席研究员,如果他在其他所有“oncore_roles”列中都标记为零……他不是那些东西。
我需要调整我的数据以使其更长,并且只有一列用于“Oncore Roles”,其中包含说明该人所扮演角色的文本,每个角色对应一行。因此,如果 Bob 有三个角色,他将得到三个几乎相同的行。除了 oncore_roles 变量外,所有内容都是相同的。
我知道这可能是 pivot_longer 的某个版本,但诀窍(我问的原因)是我需要删除所有零。 IE。对于这个特定的数据,我会留下这个:
谢谢!
这是一个选项,我们根据多项选择题和列名创建 key/value 数据集,然后将重塑后的数据连接到 return 映射的列
library(dplyr)
library(tidyr)
library(stringr)
keydat <- tibble(name = str_c("oncore_roles___", 1:12),
Oncore_role = c("Calendar Build", "Protocol Management",
"Subject Managment", "Financials", "Principal Investigator",
"Protocol Management Finance", "Regulatory",
"Investigational Pharmacist", "Division Director", "CTO Signoff",
"Roles Administration", "Statistical Analysis"))
df %>%
pivot_longer(cols = starts_with('oncore_roles')) %>%
filter(value == 1) %>%
inner_join(keydat) %>%
select(-name)
-输出
# A tibble: 6 × 7
fname employee_number job_role ActiveAccount CanAccess value Oncore_role
<chr> <chr> <chr> <chr> <chr> <dbl> <chr>
1 Linda 00000123456 Dept Research Admin Yes No 1 Calendar Build
2 Linda 00000123456 Dept Research Admin Yes No 1 Protocol Management
3 Linda 00000123456 Dept Research Admin Yes No 1 Subject Managment
4 Bob 654321 Research Regulatory Assistant Yes No 1 Principal Investigator
5 Bob 654321 Research Regulatory Assistant Yes No 1 Regulatory
6 Bob 654321 Research Regulatory Assistant Yes No 1 Statistical Analysis
如果您构建一个小的 table oncore 角色查找,比如 roles
,您可以执行以下操作:
df %>%
pivot_longer(cols = -(fname:CanAccess),names_prefix = "oncore_roles___",names_to = "id") %>%
filter(value==1) %>%
mutate(id=as.numeric(id)) %>%
left_join(roles, by="id") %>%
select(-(id:value))
输出(注意我的 roles
只有前 5 个角色,但你可以让它更长,然后你可以使用 inner_join()
,而不是 left_join()
:
fname employee_number job_role ActiveAccount CanAccess Oncore_role
<chr> <chr> <chr> <chr> <chr> <chr>
1 Linda 00000123456 Dept Research Admin Yes No Calendar Build
2 Linda 00000123456 Dept Research Admin Yes No Protocol Management
3 Linda 00000123456 Dept Research Admin Yes No Subject Management
4 Bob 654321 Research Regulatory Assistant Yes No Principal Investigator
5 Bob 654321 Research Regulatory Assistant Yes No NA
6 Bob 654321 Research Regulatory Assistant Yes No NA
roles
:
roles =tibble(
id = 1:5,
Oncore_role = c(
"Calendar Build",
"Protocol Management",
"Subject Management",
"Financial",
"Principal Investigator"
))