在 r 中重塑非列名数据框
Reshaping a non column name dataframe in r
我有这样一个数据框:
structure(list(...1 = c(NA, NA, "name_var1", "obs1_var1", "obs2_var1"
), ...2 = c(NA, NA, "name_var2", "obs1_var2", "obs2_var2"), ...3 = c(NA,
NA, "name_var3", "obs1_var3", "obs2_var3"), ...4 = c("Dimension",
"Subdimension", "name_var4", "obs1_var4", "obs2_var4"), ...5 = c("Dimension1",
"Subdimension1", "question1.1.1", "1", "4"), ...6 = c("Dimension1",
"Subdimension1", "question1.1.2", "3", "2"), ...5.1 = c("Dimension1",
"Subdimension2", "question1.2.1", "1", "2"), ...5.2 = c("Dimension1",
"Subdimension2", "question1.2.2", "4", "1"), ...5.3 = c("Dimension2",
"Subdimension1", "question2.1.1", "1", "4"), ...6.1 = c("Dimension2",
"Subdimension1", "question2.1.2", "3", "2"), ...5.4 = c("Dimension2",
"Subdimension2", "question2.2.1", "1", "2"), ...5.5 = c("Dimension2",
"Subdimension2", "question2.2.2", "4", "1")), class = "data.frame", row.names = c(NA,
-5L))
而且我想变身成这个
structure(list(name_var1 = c("obs1_var1", "obs1_var1", "obs1_var1",
"obs1_var1", "obs1_var1", "obs1_var1", "obs1_var1", "obs1_var1"
), name_var2 = c("obs1_var2", "obs1_var2", "obs1_var2", "obs1_var2",
"obs1_var2", "obs1_var2", "obs1_var2", "obs1_var2"), name_var3 = c("obs1_var3",
"obs1_var3", "obs1_var3", "obs1_var3", "obs1_var3", "obs1_var3",
"obs1_var3", "obs1_var3"), name_var4 = c("obs1_var4", "obs1_var4",
"obs1_var4", "obs1_var4", "obs1_var4", "obs1_var4", "obs1_var4",
"obs1_var4"), Dimension = c("Dimension1", "Dimension1", "Dimension1",
"Dimension1", "Dimension2", "Dimension2", "Dimension2", "Dimension2"
), Subdimension = c("Subdimension1", "Subdimension1", "Subdimension2",
"Subdimension2", "Subdimension1", "Subdimension1", "Subdimension2",
"Subdimension2"), Question = c("question1.1.1", "question1.1.2",
"question1.2.1", "question1.2.2", "question2.1.1", "question2.1.2",
"question2.2.1", "question2.2.2"), Value = c(1, 3, 1, 4, 1, 3,
1, 4)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-8L))
...并继续原始数据框中的所有观察。
知道怎么做吗?
提前感谢您的意见和帮助。
如果你必须做很多这样的数据转换,那么我推荐unpivotr
包。有一个名为 tidyxl
的相关包,可用于以 cell-by-cell 原始方式读取 Excel 文件。当 headers 列在文本中,但下面的单元格是数字、日期、逻辑等时,这可能很有用。此外,您甚至可以使用格式化信息,这有时是从 [=13= 中正确提取信息所必需的] 文件。
包作者制作了一本免费的在线书籍Spreadsheet Munging Strategies,其中介绍了很多情况。在你的情况下,你可以使用
library(tidyverse)
library(unpivotr)
start %>%
as_cells() %>%
behead("up", Dimension) %>%
behead("up", Subdimension) %>%
behead("up", Question) %>%
behead("left", name_var1) %>%
behead("left", name_var2) %>%
behead("left", name_var3) %>%
behead("left", name_var4) %>%
select(name_var1:name_var4, Dimension:Question, Value = chr)
其中 start
是您的起始数据框。实际上,最好使用 tidyxl
包读取原始 excel 文件,但这不是必需的。
我有这样一个数据框:
structure(list(...1 = c(NA, NA, "name_var1", "obs1_var1", "obs2_var1"
), ...2 = c(NA, NA, "name_var2", "obs1_var2", "obs2_var2"), ...3 = c(NA,
NA, "name_var3", "obs1_var3", "obs2_var3"), ...4 = c("Dimension",
"Subdimension", "name_var4", "obs1_var4", "obs2_var4"), ...5 = c("Dimension1",
"Subdimension1", "question1.1.1", "1", "4"), ...6 = c("Dimension1",
"Subdimension1", "question1.1.2", "3", "2"), ...5.1 = c("Dimension1",
"Subdimension2", "question1.2.1", "1", "2"), ...5.2 = c("Dimension1",
"Subdimension2", "question1.2.2", "4", "1"), ...5.3 = c("Dimension2",
"Subdimension1", "question2.1.1", "1", "4"), ...6.1 = c("Dimension2",
"Subdimension1", "question2.1.2", "3", "2"), ...5.4 = c("Dimension2",
"Subdimension2", "question2.2.1", "1", "2"), ...5.5 = c("Dimension2",
"Subdimension2", "question2.2.2", "4", "1")), class = "data.frame", row.names = c(NA,
-5L))
而且我想变身成这个
structure(list(name_var1 = c("obs1_var1", "obs1_var1", "obs1_var1",
"obs1_var1", "obs1_var1", "obs1_var1", "obs1_var1", "obs1_var1"
), name_var2 = c("obs1_var2", "obs1_var2", "obs1_var2", "obs1_var2",
"obs1_var2", "obs1_var2", "obs1_var2", "obs1_var2"), name_var3 = c("obs1_var3",
"obs1_var3", "obs1_var3", "obs1_var3", "obs1_var3", "obs1_var3",
"obs1_var3", "obs1_var3"), name_var4 = c("obs1_var4", "obs1_var4",
"obs1_var4", "obs1_var4", "obs1_var4", "obs1_var4", "obs1_var4",
"obs1_var4"), Dimension = c("Dimension1", "Dimension1", "Dimension1",
"Dimension1", "Dimension2", "Dimension2", "Dimension2", "Dimension2"
), Subdimension = c("Subdimension1", "Subdimension1", "Subdimension2",
"Subdimension2", "Subdimension1", "Subdimension1", "Subdimension2",
"Subdimension2"), Question = c("question1.1.1", "question1.1.2",
"question1.2.1", "question1.2.2", "question2.1.1", "question2.1.2",
"question2.2.1", "question2.2.2"), Value = c(1, 3, 1, 4, 1, 3,
1, 4)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-8L))
...并继续原始数据框中的所有观察。 知道怎么做吗?
提前感谢您的意见和帮助。
如果你必须做很多这样的数据转换,那么我推荐unpivotr
包。有一个名为 tidyxl
的相关包,可用于以 cell-by-cell 原始方式读取 Excel 文件。当 headers 列在文本中,但下面的单元格是数字、日期、逻辑等时,这可能很有用。此外,您甚至可以使用格式化信息,这有时是从 [=13= 中正确提取信息所必需的] 文件。
包作者制作了一本免费的在线书籍Spreadsheet Munging Strategies,其中介绍了很多情况。在你的情况下,你可以使用
library(tidyverse)
library(unpivotr)
start %>%
as_cells() %>%
behead("up", Dimension) %>%
behead("up", Subdimension) %>%
behead("up", Question) %>%
behead("left", name_var1) %>%
behead("left", name_var2) %>%
behead("left", name_var3) %>%
behead("left", name_var4) %>%
select(name_var1:name_var4, Dimension:Question, Value = chr)
其中 start
是您的起始数据框。实际上,最好使用 tidyxl
包读取原始 excel 文件,但这不是必需的。