将多个 header table 转换为长格式
Convert multiple header table to long format
我正在阅读包含多行 header 的 Excel table,它通过 read.csv 创建一个像这样的 object R.
R1 <- c("X", "X.1", "X.2", "X.3", "EU", "EU.1", "EU.2", "US", "US.1", "US.2")
R2 <- c("Min Age", "Max Age", "Min Duration", "Max Duration", "1", "2", "3", "1", "2", "3")
R3 <- c("18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01")
R4 <- c("22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05")
R5 <- c("26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21")
R6 <- c("18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40")
R7 <- c("22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50")
R8 <- c("26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99")
table1 <- as.data.frame(rbind(R1, R2, R3, R4, R5, R6, R7, R8))
我现在要如何 'flatten' 这样我才能得到 R table 和 "Min age"、"Max Age" , "Min Duration", "Max Duration", "Area", "Level", "Price" 列。 "Area" 列显示 "EU" 或 "US","Level" 列显示 1、2 或 3,然后 "Price" 列显示相应的价格在 Excel table?
中找到
如果没有多个 header 行,我会使用 tidyr 的收集函数,但似乎无法处理这些数据,有什么想法吗?
输出总共应该有 36 行 + headers
如果您按照 akrun 的建议跳过第一行,您可能会得到如下所示的数据:(自动添加 "X"s 和“.1”/“.2”通过 R)
library(tidyverse)
df <- tribble(
~Min.Age, ~Max.Age, ~Min.Duration, ~Max.Duration, ~X1.1, ~X2.1, ~X3.1, ~X1.2, ~X2.2, ~X3.2,
"18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01",
"22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05",
"26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21",
"18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40",
"22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50",
"26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99"
)
有了这些数据,您就可以使用 gather
将所有以 X 开头的 headers 收集到一列中,并将价格收集到另一列中。你可以separate
把headers变成"Level"和"Area"。最后,重新编码 Area 并从关卡中移除 "X"。
df %>%
gather(headers, Price, starts_with("X")) %>%
separate(headers, c("Level", "Area")) %>%
mutate(Area = if_else(Area == "1", "EU", "US"),
Level = parse_number(Level))
#> # A tibble: 36 x 7
#> Min.Age Max.Age Min.Duration Max.Duration Level Area Price
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 18 21 1 3 1 EU 0.12
#> 2 22 25 1 3 1 EU 0.20
#> 3 26 30 1 3 1 EU 0.25
#> 4 18 21 4 5 1 EU 0.32
#> 5 22 25 4 5 1 EU 0.40
#> 6 26 30 4 5 1 EU 0.55
#> 7 18 21 1 3 2 EU 0.32
#> 8 22 25 1 3 2 EU 0.40
#> 9 26 30 1 3 2 EU 0.50
#> 10 18 21 4 5 2 EU 0.60
#> # ... with 26 more rows
由 reprex package (v0.2.1)
创建于 2018-10-12
P.S。您可以在此处找到大量电子表格修改工作流程:https://nacnudus.github.io/spreadsheet-munging-strategies/small-multiples-with-all-headers-present-for-each-multiple.html
我正在阅读包含多行 header 的 Excel table,它通过 read.csv 创建一个像这样的 object R.
R1 <- c("X", "X.1", "X.2", "X.3", "EU", "EU.1", "EU.2", "US", "US.1", "US.2")
R2 <- c("Min Age", "Max Age", "Min Duration", "Max Duration", "1", "2", "3", "1", "2", "3")
R3 <- c("18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01")
R4 <- c("22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05")
R5 <- c("26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21")
R6 <- c("18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40")
R7 <- c("22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50")
R8 <- c("26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99")
table1 <- as.data.frame(rbind(R1, R2, R3, R4, R5, R6, R7, R8))
我现在要如何 'flatten' 这样我才能得到 R table 和 "Min age"、"Max Age" , "Min Duration", "Max Duration", "Area", "Level", "Price" 列。 "Area" 列显示 "EU" 或 "US","Level" 列显示 1、2 或 3,然后 "Price" 列显示相应的价格在 Excel table?
中找到如果没有多个 header 行,我会使用 tidyr 的收集函数,但似乎无法处理这些数据,有什么想法吗?
输出总共应该有 36 行 + headers
如果您按照 akrun 的建议跳过第一行,您可能会得到如下所示的数据:(自动添加 "X"s 和“.1”/“.2”通过 R)
library(tidyverse)
df <- tribble(
~Min.Age, ~Max.Age, ~Min.Duration, ~Max.Duration, ~X1.1, ~X2.1, ~X3.1, ~X1.2, ~X2.2, ~X3.2,
"18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01",
"22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05",
"26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21",
"18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40",
"22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50",
"26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99"
)
有了这些数据,您就可以使用 gather
将所有以 X 开头的 headers 收集到一列中,并将价格收集到另一列中。你可以separate
把headers变成"Level"和"Area"。最后,重新编码 Area 并从关卡中移除 "X"。
df %>%
gather(headers, Price, starts_with("X")) %>%
separate(headers, c("Level", "Area")) %>%
mutate(Area = if_else(Area == "1", "EU", "US"),
Level = parse_number(Level))
#> # A tibble: 36 x 7
#> Min.Age Max.Age Min.Duration Max.Duration Level Area Price
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 18 21 1 3 1 EU 0.12
#> 2 22 25 1 3 1 EU 0.20
#> 3 26 30 1 3 1 EU 0.25
#> 4 18 21 4 5 1 EU 0.32
#> 5 22 25 4 5 1 EU 0.40
#> 6 26 30 4 5 1 EU 0.55
#> 7 18 21 1 3 2 EU 0.32
#> 8 22 25 1 3 2 EU 0.40
#> 9 26 30 1 3 2 EU 0.50
#> 10 18 21 4 5 2 EU 0.60
#> # ... with 26 more rows
由 reprex package (v0.2.1)
创建于 2018-10-12P.S。您可以在此处找到大量电子表格修改工作流程:https://nacnudus.github.io/spreadsheet-munging-strategies/small-multiples-with-all-headers-present-for-each-multiple.html