如何根据 "checked" 与 "unchecked" 重塑
How to Reshape based on "checked" vs "unchecked"
我有一个数据集,其中包含类似于以下结构的数据:
ID | Treatment=Induction Chemo | Treatment=Hypomethylating Chemo | Treatment=Consolidation Chemo
Patient1 Checked Unchecked Unchecked
Patient2 Unchecked Checked Unchecked
Patient3 Unchecked Unchecked Checked
我将如何格式化这些数据以使其看起来更像这样?
ID Treatment
Patient1 Induction Chemo
Patient2 Hypomethylating Chemo
Patient3 Consolidation Chemo
我想使用 R 自动执行此操作,这完全可行吗?我不确定重塑包是否具有这些功能。如果一切都失败了,我愿意手动编辑 headers 以从每个中删除 "Treatment=",但我宁愿自动完成。谢谢!
您可以试试这个,但是,作为警告,我假设您在特定列中没有重复的 checked 值。如果是这种情况,这应该有效。
假设 df 是您的输入 data.frame。
df1 <- df
df1$Final_col <- do.call("paste0",data.frame(sapply(names(df), function(x)ifelse(df[,x] == "Checked", gsub("Treatment=","",x), '')), stringsAsFactors=F))
逻辑:
在 df 的所有列上使用条件 == "Checked" 的 sapply
中的 ifelse
,然后使用 gsub
替换名称中的 "Treatment=" ,这些列上没有任何内容,这样("Treatment=")之后的值将只保留为可以获取文本,只要有成功的 ifelse
值为 TRUE,我们将用获取的值替换这些值值 gsub
。最后使用 do.call
粘贴功能粘贴所有结果,只得到一列。
数据:
df <- structure(list(ID = c("Patient1", "Patient2", "Patient3"), `Treatment=Induction Chemo` = c("Checked",
"Unchecked", "Unchecked"), `Treatment=Hypomethylating Chemo` = c("Unchecked",
"Checked", "Unchecked"), `Treatment=Consolidation Chemo` = c("Unchecked",
"Unchecked", "Checked")), .Names = c("ID", "Treatment=Induction Chemo",
"Treatment=Hypomethylating Chemo", "Treatment=Consolidation Chemo"
), class = "data.frame", row.names = c(NA, -3L))
输出:
你可以查看答案输出中的Final_col
,你可以删除其他列,我保留了它们以便你可以比较输入和输出。
> df1
ID Treatment=Induction Chemo Treatment=Hypomethylating Chemo
1 Patient1 Checked Unchecked
2 Patient2 Unchecked Checked
3 Patient3 Unchecked Unchecked
Treatment=Consolidation Chemo Final_col
1 Unchecked Induction Chemo
2 Unchecked Hypomethylating Chemo
3 Checked Consolidation Chemo
我有一个数据集,其中包含类似于以下结构的数据:
ID | Treatment=Induction Chemo | Treatment=Hypomethylating Chemo | Treatment=Consolidation Chemo
Patient1 Checked Unchecked Unchecked
Patient2 Unchecked Checked Unchecked
Patient3 Unchecked Unchecked Checked
我将如何格式化这些数据以使其看起来更像这样?
ID Treatment
Patient1 Induction Chemo
Patient2 Hypomethylating Chemo
Patient3 Consolidation Chemo
我想使用 R 自动执行此操作,这完全可行吗?我不确定重塑包是否具有这些功能。如果一切都失败了,我愿意手动编辑 headers 以从每个中删除 "Treatment=",但我宁愿自动完成。谢谢!
您可以试试这个,但是,作为警告,我假设您在特定列中没有重复的 checked 值。如果是这种情况,这应该有效。
假设 df 是您的输入 data.frame。
df1 <- df
df1$Final_col <- do.call("paste0",data.frame(sapply(names(df), function(x)ifelse(df[,x] == "Checked", gsub("Treatment=","",x), '')), stringsAsFactors=F))
逻辑:
在 df 的所有列上使用条件 == "Checked" 的 sapply
中的 ifelse
,然后使用 gsub
替换名称中的 "Treatment=" ,这些列上没有任何内容,这样("Treatment=")之后的值将只保留为可以获取文本,只要有成功的 ifelse
值为 TRUE,我们将用获取的值替换这些值值 gsub
。最后使用 do.call
粘贴功能粘贴所有结果,只得到一列。
数据:
df <- structure(list(ID = c("Patient1", "Patient2", "Patient3"), `Treatment=Induction Chemo` = c("Checked",
"Unchecked", "Unchecked"), `Treatment=Hypomethylating Chemo` = c("Unchecked",
"Checked", "Unchecked"), `Treatment=Consolidation Chemo` = c("Unchecked",
"Unchecked", "Checked")), .Names = c("ID", "Treatment=Induction Chemo",
"Treatment=Hypomethylating Chemo", "Treatment=Consolidation Chemo"
), class = "data.frame", row.names = c(NA, -3L))
输出:
你可以查看答案输出中的Final_col
,你可以删除其他列,我保留了它们以便你可以比较输入和输出。
> df1
ID Treatment=Induction Chemo Treatment=Hypomethylating Chemo
1 Patient1 Checked Unchecked
2 Patient2 Unchecked Checked
3 Patient3 Unchecked Unchecked
Treatment=Consolidation Chemo Final_col
1 Unchecked Induction Chemo
2 Unchecked Hypomethylating Chemo
3 Checked Consolidation Chemo