在 R 中格式化 table

Question

我有

这样的数据

            147 147 231 231
2011_15_1   99  106 152 156
2011_15_2   99  102 150 156
2011_15_3   99  99  152 156
2011_15_7   99  106 152 156

我想将其重新格式化为：

            147     231
2011_15_1   99      152
            106     156
2011_15_2   99      150
            102     156
2011_15_3   99      152
            99      156
2011_15_7   99      152
            106     156

我已经尝试将 table 读入 R 并使用 'reshape2' 包和 melt() 函数，但我不确定如何将同名的列折叠成 narrow-形式。

有人可以帮忙吗？

Answer 1

您可以使用 dplyr + tidyr.

library(tidyr)
library(dplyr)

df %>%
  mutate(date = row.names(.)) %>%
  gather(key, value, -date) %>%
  arrange(date) %>%
  mutate(key = gsub("[.]1$", "", key)) %>%
  group_by(date, key) %>%
  mutate(id = 1:n()) %>%
  spread(key, value) %>%
  select(-id)

结果：

# A tibble: 8 x 3
# Groups:   date [4]
       date  X147  X231
*     <chr> <int> <int>
1 2011_15_1    99   152
2 2011_15_1   106   156
3 2011_15_2    99   150
4 2011_15_2   102   156
5 2011_15_3    99   152
6 2011_15_3    99   156
7 2011_15_7    99   152
8 2011_15_7   106   156

注：

read.table在读入数据时对列名做了两次转换：在前面附加X和.1作为重复列名的后缀。这是因为纯数字和相同的列名均无效。
我基本上做的是首先将行名转换为第一列 date,
将数据从宽格式转换为长格式 (gather)，
删除key列中的所有.1后缀，
添加了一个 id 即 group_by date 和 key 使每一行都是唯一的，
然后使用新的 key 和 value 列最终将数据转换回宽格式 (spread)。

数据：

df = read.table(text="         147 147 231 231
                2011_15_1   99  106 152 156
                2011_15_2   99  102 150 156
                2011_15_3   99  99  152 156
                2011_15_7   99  106 152 156", header = TRUE)

在 R 中格式化 table

Formatting a table in R

formatting

text

r

reshape2