嵌套的行标签到列

Nested Row Labels to Column

我有一个 CSV,它似乎是 Excel 枢轴 Table 的输出,名称嵌套为重复组的行标签。我想清理数据,以便在单独的列中重复行标签,最好使用 dplyr。

数据如下所示:

dd <- data.frame(variables = c("Abington", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price", "Acton", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price"), Year1 = c(" ", 16, 50,415000,413500," ",23,60,799900,704000), Year2 = c(" ",8,13,583000,575000," ",9,39,995000,800000))

dd

variables              Year1   Year2
Abington              
Number of Sales        16      8
YTD Number of Sales    50      13
Median Sale Price      415000  583000
YTD Median Sale Price  413500  575000
Acton              
Number of Sales        23      9
YTD Number of Sales    60      39
Median Sale Price      799900  995000
YTD Median Sale Price  704000  800000

我希望它看起来像这样:

Town          variables               Year1  Year2           
Abington      Number of Sales         16     8
Abington      YTD Number of Sales     50     13
Abington      Median Sale Price       415000 583000
Abington      YTD Median Sale Price   413500 575000          
Acton         Number of Sales         23      9
Acton         YTD Number of Sales     60     39
Acton         Median Sale Price       799900 995000
Acton         YTD Median Sale Price   704000 800000

谢谢!

为此我们可以使用 tidyverse(或 dplyr & tidyr):

library(tidyverse)

dd %>%
  mutate(Town = ifelse(Year1 == " " & Year2 == " ", variables, NA)) %>%
  fill(Town, .direction = "down") %>%
  filter(Town != variables) %>%
  relocate(Town)

导致:

      Town             variables  Year1  Year2
1 Abington       Number of Sales     16      8
2 Abington   YTD Number of Sales     50     13
3 Abington     Median Sale Price 415000 583000
4 Abington YTD Median Sale Price 413500 575000
5    Acton       Number of Sales     23      9
6    Acton   YTD Number of Sales     60     39
7    Acton     Median Sale Price 799900 995000
8    Acton YTD Median Sale Price 704000  8e+05

重要的是要注意 Year1Year2 处的空值实际上是空格 (" ") 而不是空字符串或 NA。

这是另一种方法:

bind_cols(
  tibble(Town=rep(filter(dd,is.na(as.numeric(Year1)))$variables, each=4)),
  filter(dd,!is.na(as.numeric(Year1)))
)

输出:

  Town     variables             Year1  Year2 
  <chr>    <chr>                 <chr>  <chr> 
1 Abington Number of Sales       16     8     
2 Abington YTD Number of Sales   50     13    
3 Abington Median Sale Price     415000 583000
4 Abington YTD Median Sale Price 413500 575000
5 Acton    Number of Sales       23     9     
6 Acton    YTD Number of Sales   60     39    
7 Acton    Median Sale Price     799900 995000
8 Acton    YTD Median Sale Price 704000 8e+05