嵌套的行标签到列
Nested Row Labels to Column
我有一个 CSV,它似乎是 Excel 枢轴 Table 的输出,名称嵌套为重复组的行标签。我想清理数据,以便在单独的列中重复行标签,最好使用 dplyr。
数据如下所示:
dd <- data.frame(variables = c("Abington", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price", "Acton", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price"), Year1 = c(" ", 16, 50,415000,413500," ",23,60,799900,704000), Year2 = c(" ",8,13,583000,575000," ",9,39,995000,800000))
dd
variables Year1 Year2
Abington
Number of Sales 16 8
YTD Number of Sales 50 13
Median Sale Price 415000 583000
YTD Median Sale Price 413500 575000
Acton
Number of Sales 23 9
YTD Number of Sales 60 39
Median Sale Price 799900 995000
YTD Median Sale Price 704000 800000
我希望它看起来像这样:
Town variables Year1 Year2
Abington Number of Sales 16 8
Abington YTD Number of Sales 50 13
Abington Median Sale Price 415000 583000
Abington YTD Median Sale Price 413500 575000
Acton Number of Sales 23 9
Acton YTD Number of Sales 60 39
Acton Median Sale Price 799900 995000
Acton YTD Median Sale Price 704000 800000
谢谢!
为此我们可以使用 tidyverse
(或 dplyr
& tidyr
):
library(tidyverse)
dd %>%
mutate(Town = ifelse(Year1 == " " & Year2 == " ", variables, NA)) %>%
fill(Town, .direction = "down") %>%
filter(Town != variables) %>%
relocate(Town)
导致:
Town variables Year1 Year2
1 Abington Number of Sales 16 8
2 Abington YTD Number of Sales 50 13
3 Abington Median Sale Price 415000 583000
4 Abington YTD Median Sale Price 413500 575000
5 Acton Number of Sales 23 9
6 Acton YTD Number of Sales 60 39
7 Acton Median Sale Price 799900 995000
8 Acton YTD Median Sale Price 704000 8e+05
重要的是要注意 Year1
和 Year2
处的空值实际上是空格 (" ") 而不是空字符串或 NA。
这是另一种方法:
bind_cols(
tibble(Town=rep(filter(dd,is.na(as.numeric(Year1)))$variables, each=4)),
filter(dd,!is.na(as.numeric(Year1)))
)
输出:
Town variables Year1 Year2
<chr> <chr> <chr> <chr>
1 Abington Number of Sales 16 8
2 Abington YTD Number of Sales 50 13
3 Abington Median Sale Price 415000 583000
4 Abington YTD Median Sale Price 413500 575000
5 Acton Number of Sales 23 9
6 Acton YTD Number of Sales 60 39
7 Acton Median Sale Price 799900 995000
8 Acton YTD Median Sale Price 704000 8e+05
我有一个 CSV,它似乎是 Excel 枢轴 Table 的输出,名称嵌套为重复组的行标签。我想清理数据,以便在单独的列中重复行标签,最好使用 dplyr。
数据如下所示:
dd <- data.frame(variables = c("Abington", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price", "Acton", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price"), Year1 = c(" ", 16, 50,415000,413500," ",23,60,799900,704000), Year2 = c(" ",8,13,583000,575000," ",9,39,995000,800000))
dd
variables Year1 Year2
Abington
Number of Sales 16 8
YTD Number of Sales 50 13
Median Sale Price 415000 583000
YTD Median Sale Price 413500 575000
Acton
Number of Sales 23 9
YTD Number of Sales 60 39
Median Sale Price 799900 995000
YTD Median Sale Price 704000 800000
我希望它看起来像这样:
Town variables Year1 Year2
Abington Number of Sales 16 8
Abington YTD Number of Sales 50 13
Abington Median Sale Price 415000 583000
Abington YTD Median Sale Price 413500 575000
Acton Number of Sales 23 9
Acton YTD Number of Sales 60 39
Acton Median Sale Price 799900 995000
Acton YTD Median Sale Price 704000 800000
谢谢!
为此我们可以使用 tidyverse
(或 dplyr
& tidyr
):
library(tidyverse)
dd %>%
mutate(Town = ifelse(Year1 == " " & Year2 == " ", variables, NA)) %>%
fill(Town, .direction = "down") %>%
filter(Town != variables) %>%
relocate(Town)
导致:
Town variables Year1 Year2
1 Abington Number of Sales 16 8
2 Abington YTD Number of Sales 50 13
3 Abington Median Sale Price 415000 583000
4 Abington YTD Median Sale Price 413500 575000
5 Acton Number of Sales 23 9
6 Acton YTD Number of Sales 60 39
7 Acton Median Sale Price 799900 995000
8 Acton YTD Median Sale Price 704000 8e+05
重要的是要注意 Year1
和 Year2
处的空值实际上是空格 (" ") 而不是空字符串或 NA。
这是另一种方法:
bind_cols(
tibble(Town=rep(filter(dd,is.na(as.numeric(Year1)))$variables, each=4)),
filter(dd,!is.na(as.numeric(Year1)))
)
输出:
Town variables Year1 Year2
<chr> <chr> <chr> <chr>
1 Abington Number of Sales 16 8
2 Abington YTD Number of Sales 50 13
3 Abington Median Sale Price 415000 583000
4 Abington YTD Median Sale Price 413500 575000
5 Acton Number of Sales 23 9
6 Acton YTD Number of Sales 60 39
7 Acton Median Sale Price 799900 995000
8 Acton YTD Median Sale Price 704000 8e+05