如何基于 R 中长格式的数据框创建具有不同列长度的 Markdown table?
How to create a Markdown table with different column lengths based on a dataframe in long format in R?
我正在编写 R Markdown 文件,我想将其作为手稿提交给学术期刊。我想创建一个 table 来显示哪些三个词 (item2) 与某些关键字 (item1) 最常同时出现。请注意,某些关键词有超过三个同时出现的词。我目前正在处理的数据:
item1 <- c("water","water","water","water","water","sun","sun","sun","sun","moon","moon","moon")
item2 <- c("tree","dog","cat","fish","eagle","bird","table","bed","flower","house","desk","tiger")
n <- c("200","83","34","34","34","300","250","77","77","122","46","46")
df <- data.frame(item1,item2,n)
给出这个数据框:
item1 item2 n
1 water tree 200
2 water dog 83
3 water cat 34
4 water fish 34
5 water eagle 34
6 sun bird 300
7 sun table 250
8 sun bed 77
9 sun flower 77
10 moon house 122
11 moon desk 46
12 moon tiger 46
最终,我想将数据传递给函数papaja::apa_table
,这需要一个data.frame(或矩阵/列表)。因此,我需要重塑数据。
我的问题:
如何重塑数据(最好使用 dplyr
)以获得以下结构?
water_item2 water_n sun_item2 sun_n moon_item2 moon_n
1 tree 200 bird 300 house 122
2 dog 83 table 250 desk 46
3 cat 34 bed 77 tiger 46
4 fish 34 flower 77 <NA> <NA>
5 eagle 34 <NA> <NA> <NA> <NA>
我们可以借用 old answer of mine 的方法来解决不同的问题,并通过按组创建唯一标识符来修改经典的 gather()
、unite()
、spread()
策略为避免重复标识符,然后删除该变量:
library(dplyr)
library(tidyr)
item1 <- c("water","water","water","water","water","sun","sun","sun","sun","moon","moon","moon")
item2 <- c("tree","dog","cat","fish","eagle","bird","table","bed","flower","house","desk","tiger")
n <- c("200","83","34","34","34","300","250","77","77","122","46","46")
# Owing to Richard Telford's excellent comment,
# I use data_frame() (or equivalently for our purposes,
# data.frame(..., stringsAsFactors = FALSE))
# to avoid turning the strings into factors
df <- data_frame(item1,item2,n)
df %>%
group_by(item1) %>%
mutate(id = 1:n()) %>%
ungroup() %>%
gather(temp, val, item2, n) %>%
unite(temp2, item1, temp, sep = '_') %>%
spread(temp2, val) %>%
select(-id)
# A tibble: 5 x 6
moon_item2 moon_n sun_item2 sun_n water_item2 water_n
<chr> <chr> <chr> <chr> <chr> <chr>
1 house 122 bird 300 tree 200
2 desk 46 table 250 dog 83
3 tiger 46 bed 77 cat 34
4 NA NA flower 77 fish 34
5 NA NA NA NA eagle 34
我正在编写 R Markdown 文件,我想将其作为手稿提交给学术期刊。我想创建一个 table 来显示哪些三个词 (item2) 与某些关键字 (item1) 最常同时出现。请注意,某些关键词有超过三个同时出现的词。我目前正在处理的数据:
item1 <- c("water","water","water","water","water","sun","sun","sun","sun","moon","moon","moon")
item2 <- c("tree","dog","cat","fish","eagle","bird","table","bed","flower","house","desk","tiger")
n <- c("200","83","34","34","34","300","250","77","77","122","46","46")
df <- data.frame(item1,item2,n)
给出这个数据框:
item1 item2 n
1 water tree 200
2 water dog 83
3 water cat 34
4 water fish 34
5 water eagle 34
6 sun bird 300
7 sun table 250
8 sun bed 77
9 sun flower 77
10 moon house 122
11 moon desk 46
12 moon tiger 46
最终,我想将数据传递给函数papaja::apa_table
,这需要一个data.frame(或矩阵/列表)。因此,我需要重塑数据。
我的问题:
如何重塑数据(最好使用 dplyr
)以获得以下结构?
water_item2 water_n sun_item2 sun_n moon_item2 moon_n
1 tree 200 bird 300 house 122
2 dog 83 table 250 desk 46
3 cat 34 bed 77 tiger 46
4 fish 34 flower 77 <NA> <NA>
5 eagle 34 <NA> <NA> <NA> <NA>
我们可以借用 old answer of mine 的方法来解决不同的问题,并通过按组创建唯一标识符来修改经典的 gather()
、unite()
、spread()
策略为避免重复标识符,然后删除该变量:
library(dplyr)
library(tidyr)
item1 <- c("water","water","water","water","water","sun","sun","sun","sun","moon","moon","moon")
item2 <- c("tree","dog","cat","fish","eagle","bird","table","bed","flower","house","desk","tiger")
n <- c("200","83","34","34","34","300","250","77","77","122","46","46")
# Owing to Richard Telford's excellent comment,
# I use data_frame() (or equivalently for our purposes,
# data.frame(..., stringsAsFactors = FALSE))
# to avoid turning the strings into factors
df <- data_frame(item1,item2,n)
df %>%
group_by(item1) %>%
mutate(id = 1:n()) %>%
ungroup() %>%
gather(temp, val, item2, n) %>%
unite(temp2, item1, temp, sep = '_') %>%
spread(temp2, val) %>%
select(-id)
# A tibble: 5 x 6
moon_item2 moon_n sun_item2 sun_n water_item2 water_n
<chr> <chr> <chr> <chr> <chr> <chr>
1 house 122 bird 300 tree 200
2 desk 46 table 250 dog 83
3 tiger 46 bed 77 cat 34
4 NA NA flower 77 fish 34
5 NA NA NA NA eagle 34