当 r 中没有变量值时重塑

Question

我有一个只有两列的 data.frame。一个是 barcodeid 另一个是 gene.

barcodeid gene
M001-M008-S137 IL12RB1
M001-M008-S137 IL7RA
M001-M008-S137 LMP1
M001-M012-S080 CRLF2
M001-M012-S080 ICOS
M001-M012-S080 IL7RA

我想以此结尾 table:

barcodeID geneSequence
M001-M008-S137 IL12RB1-IL7RA-LMP1
M001-M012-S080 CRLF2-ICOS-IL7RA

我在 r 中查找了 reshape、dcast、spread、gather，据我所知，这些函数不允许我来做这个。感谢任何帮助！

Answer 1

假设 df 是您的 data.frame 并且 R 基本函数的组合会有所帮助：

> x <- lapply(split(df$gene, df$barcodeid), paste0, collapse="-")
> data.frame(barcodeid=names(x), geneSequence=unlist(x), row.names = NULL)
       barcodeid       geneSequence
1 M001-M008-S137 IL12RB1-IL7RA-LMP1
2 M001-M012-S080   CRLF2-ICOS-IL7RA

Answer 2

使用 dplyr 你可以做：

df %>% 
  group_by(barcodeid) %>% 
  mutate(geneSequence = paste(gene, collapse = "-")) %>%
  select(-gene) %>% 
  slice(1)


# A tibble: 2 x 2
# Groups:   barcodeid [2]
   barcodeid       geneSequence
      <fctr>              <chr>
1 M001-M008-S137 IL12RB1-IL7RA-LMP1
2 M001-M012-S080   CRLF2-ICOS-IL7RA

Answer 3

更多选项：

reshape2::dcast(DT, barcodeid ~ ., paste, collapse="-")

aggregate(. ~ barcodeid, DT, paste, collapse="-")

aggregate 具有自动命名为 "gene" 而不是“.”的好处。在这里，虽然如果需要一个新名称，我想它们是可以互换的，然后是...

names(res)[2] <- "geneSequence"

要还原更改，一种方法是：

splitstackshape::cSplit(res, "geneSequence", "-", direction = "long")

有关更多选项，请参阅 Split comma-separated column into separate rows。

当 r 中没有变量值时重塑

reshaping when there is no variable value in r

r

reshape

dataframe

tidyr