将重复行转换为列

Transform duplicate rows to columns

我正在处理一个包含数百个变量的数据库,但是,由于它的来源是 JSON,这让我很难组织它。例如,它不是在列中提供信息的文件,而是创建新行。请参阅示例。

df1 <- data_frame(ID = c(111,111,111,111,111,111,222,222,333),
                  NAME = c('JOHN','JOHN','MARY','MARY','JAMES','JAMES','WILL','WILL','MARK'),
                  ADRESS = c('NY','NY','NY','NY','ROMA','ROMA','LONDON','TOKYO',''),
                  COLOR = c('GREEN','GREEN','RED','RED','YELLOW','YELLOW','BLUE','BLUE','ORANGE'),
                  CAR = c('','','BMW','BMW','TRUCK','TRUCK','FORD','FORD','FERRARI'),
                  COUNTRY = c('USA','USA','USA','USA','USA','USA','USA','USA','USA'))

我想以按 ID 分组的方式组织文件,如下例所示:

df2 <- data_frame(ID = c(111,222,333),
                  NAME1 = c('JOHN','WILL','MARK'),
                  NAME2 = c('MARY','',''),
                  NAME3 = c('JAMES','',''),
                  ADRESS1 = c('NY','LONDON',''),
                  ADRESS2 = c('NY','TOKYO',''),
                  ADRESS3 = c('ROMA','',''),
                  COLOR1 = c('GREEN','BLUE','ORANGE'),
                  COLOR2 = c('RED','',''),
                  COLOR3 = c('YELLOW','',''),
                  CAR1 = c('','FORD','FERRARI'),
                  CAR2 = c('BMW','',''),
                  CAR3 = c('TRUCK','',''),
                  COUNTRY = c('USA','USA','USA'))

但是请注意,COUNTRY 变量不需要有很多列(COUNTRY1、COUNTRY2、COUNTRY3),因为结果是重复的。在我的原始文件中,我会发现无数这样的情况。 如何在df2中均匀排列数据?

也许我们可以使用 reshape

尝试以下基本 R 代码
u <- reshape(
  transform(
    unique(df1),
    GRP = ave(seq_along(ID), ID, FUN = seq_along)
  ),
  direction = "wide",
  idvar = "ID",
  timevar = "GRP"
)

u[order(match(gsub("\.\d+", "", names(u)), names(df1)))]

这给出了

> u[order(match(gsub("\.\d+", "", names(u)), names(df1)))]
   ID NAME.1 NAME.2 NAME.3 ADRESS.1 ADRESS.2 ADRESS.3 COLOR.1 COLOR.2 COLOR.3
1 111   JOHN   MARY  JAMES       NY       NY     ROMA   GREEN     RED  YELLOW
7 222   WILL   WILL   <NA>   LONDON    TOKYO     <NA>    BLUE    BLUE    <NA>
9 333   MARK   <NA>   <NA>              <NA>     <NA>  ORANGE    <NA>    <NA>
    CAR.1 CAR.2 CAR.3 COUNTRY.1 COUNTRY.2 COUNTRY.3
1           BMW TRUCK       USA       USA       USA
7    FORD  FORD  <NA>       USA       USA      <NA>
9 FERRARI  <NA>  <NA>       USA      <NA>      <NA>

pivot_wider

也有一个选项
library(dplyr)
library(tidyr)
library(data.table)
distinct(df1) %>% 
  mutate(rn = rowid(ID)) %>%
  pivot_wider(names_from = rn, values_from = NAME:CAR, 
    names_sep = "", values_fill = "") %>%
  select(-COUNTRY, COUNTRY)

-输出

# A tibble: 3 × 14
     ID NAME1 NAME2  NAME3   ADRESS1  ADRESS2 ADRESS3 COLOR1 COLOR2 COLOR3   CAR1      CAR2   CAR3    COUNTRY
  <dbl> <chr> <chr>  <chr>   <chr>    <chr>   <chr>   <chr>  <chr>  <chr>    <chr>     <chr>  <chr>   <chr>  
1   111 JOHN  "MARY" "JAMES" "NY"     "NY"    "ROMA"  GREEN  "RED"  "YELLOW" ""        "BMW"  "TRUCK" USA    
2   222 WILL  "WILL" ""      "LONDON" "TOKYO" ""      BLUE   "BLUE" ""       "FORD"    "FORD" ""      USA    
3   333 MARK  ""     ""      ""       ""      ""      ORANGE ""     ""       "FERRARI" ""     ""      USA