在 r 中重塑非列名数据框

Reshaping a non column name dataframe in r

我有这样一个数据框:

structure(list(...1 = c(NA, NA, "name_var1", "obs1_var1", "obs2_var1"
), ...2 = c(NA, NA, "name_var2", "obs1_var2", "obs2_var2"), ...3 = c(NA, 
NA, "name_var3", "obs1_var3", "obs2_var3"), ...4 = c("Dimension", 
"Subdimension", "name_var4", "obs1_var4", "obs2_var4"), ...5 = c("Dimension1", 
"Subdimension1", "question1.1.1", "1", "4"), ...6 = c("Dimension1", 
"Subdimension1", "question1.1.2", "3", "2"), ...5.1 = c("Dimension1", 
"Subdimension2", "question1.2.1", "1", "2"), ...5.2 = c("Dimension1", 
"Subdimension2", "question1.2.2", "4", "1"), ...5.3 = c("Dimension2", 
"Subdimension1", "question2.1.1", "1", "4"), ...6.1 = c("Dimension2", 
"Subdimension1", "question2.1.2", "3", "2"), ...5.4 = c("Dimension2", 
"Subdimension2", "question2.2.1", "1", "2"), ...5.5 = c("Dimension2", 
"Subdimension2", "question2.2.2", "4", "1")), class = "data.frame", row.names = c(NA, 
-5L))

而且我想变身成这个

structure(list(name_var1 = c("obs1_var1", "obs1_var1", "obs1_var1", 
"obs1_var1", "obs1_var1", "obs1_var1", "obs1_var1", "obs1_var1"
), name_var2 = c("obs1_var2", "obs1_var2", "obs1_var2", "obs1_var2", 
"obs1_var2", "obs1_var2", "obs1_var2", "obs1_var2"), name_var3 = c("obs1_var3", 
"obs1_var3", "obs1_var3", "obs1_var3", "obs1_var3", "obs1_var3", 
"obs1_var3", "obs1_var3"), name_var4 = c("obs1_var4", "obs1_var4", 
"obs1_var4", "obs1_var4", "obs1_var4", "obs1_var4", "obs1_var4", 
"obs1_var4"), Dimension = c("Dimension1", "Dimension1", "Dimension1", 
"Dimension1", "Dimension2", "Dimension2", "Dimension2", "Dimension2"
), Subdimension = c("Subdimension1", "Subdimension1", "Subdimension2", 
"Subdimension2", "Subdimension1", "Subdimension1", "Subdimension2", 
"Subdimension2"), Question = c("question1.1.1", "question1.1.2", 
"question1.2.1", "question1.2.2", "question2.1.1", "question2.1.2", 
"question2.2.1", "question2.2.2"), Value = c(1, 3, 1, 4, 1, 3, 
1, 4)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L))

...并继续原始数据框中的所有观察。 知道怎么做吗?

提前感谢您的意见和帮助。

如果你必须做很多这样的数据转换,那么我推荐unpivotr包。有一个名为 tidyxl 的相关包,可用于以 cell-by-cell 原始方式读取 Excel 文件。当 headers 列在文本中,但下面的单元格是数字、日期、逻辑等时,这可能很有用。此外,您甚至可以使用格式化信息,这有时是从 [=13= 中正确提取信息所必需的] 文件。

包作者制作了一本免费的在线书籍Spreadsheet Munging Strategies,其中介绍了很多情况。在你的情况下,你可以使用

library(tidyverse)
library(unpivotr)

start %>% 
    as_cells() %>% 
    behead("up", Dimension) %>% 
    behead("up", Subdimension) %>%
    behead("up", Question) %>%
    behead("left", name_var1) %>% 
    behead("left", name_var2) %>% 
    behead("left", name_var3) %>% 
    behead("left", name_var4) %>% 
    select(name_var1:name_var4, Dimension:Question, Value = chr)

其中 start 是您的起始数据框。实际上,最好使用 tidyxl 包读取原始 excel 文件,但这不是必需的。