从宽格式到长格式时保留列的顺序
Preserve order of columns when going from wide to long format
当我从宽格式到长格式收集列时,我试图保留它们的顺序。我遇到的问题是在 gather
和 summarize
之后订单丢失了。列数很大,所以我不想手动输入顺序。
这是一个例子:
library(tidyr)
library(dplyr)
N <- 4
df <- data.frame(sample = c(1,1,2,2),
y1.1 = rnorm(N), y2.1 = rnorm(N), y10.1 = rnorm(N))
> df
sample y1.1 y2.1 y10.1
1 1 1.040938 0.8851727 -0.3617224
2 1 1.175879 1.0009824 -1.1352406
3 2 -1.501832 0.3446469 -1.8687008
4 2 -1.326817 0.4434628 -0.8795962
我想要的是保留列的顺序。在我做了一些操作之后,订单丢失了。看到这里:
dfg <- df %>%
gather(key="key", value="value", -sample) %>%
group_by(sample, key) %>%
summarize(mean = mean(value))
> filter(dfg, sample == 1)
sample key mean
<dbl> <chr> <dbl>
1 1 y1.1 0.2936335
2 1 y10.1 0.6170505
3 1 y2.1 -0.2250543
您可以看到它如何将 y10.1
置于我不想要的 y2.1
之前。我想要的是保留该顺序,参见此处:
dfg <- df %>%
gather(key="key", value="value", -sample)
> filter(dfg, sample == 1)
sample key value
1 1 y1.1 0.60171521
2 1 y1.1 -0.01444823
3 1 y2.1 0.81566726
4 1 y2.1 -1.26577581
5 1 y10.1 0.41686388
6 1 y10.1 0.81723707
出于某种原因,group_by
和 summarize
操作更改了顺序。我不确定为什么。我尝试了 ungroup
命令,但它没有任何作用。正如我之前所说,我的实际数据框有很多列,我需要保留顺序。保留顺序的原因是我可以按正确的顺序绘制数据。
有什么想法吗?
我通过查找 table 找到了可行的解决方案。它似乎对我有用,因为我可以提取列名并将有序编号分配给列名,然后与我的 data.frame
.
配对
解决方法如下:
lookup <- tibble(key = c("y1.1", "y2.1", "y10.1"),
index = c(1,2,3))
> left_join(dfg, lookup, by="key")
# A tibble: 6 x 4
sample key mean index
<dbl> <chr> <dbl> <dbl>
1 1 y1.1 0.2936335 1
2 1 y10.1 0.6170505 3
3 1 y2.1 -0.2250543 2
4 2 y1.1 1.3652070 1
5 2 y10.1 0.9889233 3
6 2 y2.1 0.5216553 2
或者您可以将键列转换为水平反映原始列名称顺序的因子:
df %>%
gather(key="key", value="value", -sample) %>%
mutate(key=factor(key, levels=names(df)[-1])) %>% # add this line to convert the key to a factor
group_by(sample, key) %>%
summarize(mean = mean(value)) %>%
filter(sample == 1)
# A tibble: 3 x 3
# Groups: sample [1]
# sample key mean
# <dbl> <fctr> <dbl>
#1 1 y1.1 0.8310786
#2 1 y2.1 -1.2596933
#3 1 y10.1 0.8208812
另一种方法是 arrange
使用自定义版本的要排序的键列的数据框:
library(dplyr)
library(tidyr)
df %>%
gather(key="key", value="value", -sample) %>%
group_by(sample, key) %>%
summarize(mean = mean(value)) %>%
arrange(as.numeric(stringr::str_replace(key, "y", "")), .by_group = TRUE)
#> # A tibble: 6 x 3
#> # Groups: sample [2]
#> sample key mean
#> <dbl> <chr> <dbl>
#> 1 1 y1.1 0.07001689
#> 2 1 y2.1 1.15349430
#> 3 1 y10.1 1.18266024
#> 4 2 y1.1 0.42616604
#> 5 2 y2.1 1.05891682
#> 6 2 y10.1 -0.12561209
如果您的列确实按其包含的数字排序,这应该有效:
library(readr)
df %>%
gather(key="key", value="value", -sample) %>%
group_by(sample, key) %>%
summarize(mean = mean(value)) %>%
arrange(parse_number(key)) %>% # <- sorting by number contained in key
filter(sample == 1)
# # A tibble: 3 x 3
# # Groups: sample [1]
# sample key mean
# <dbl> <chr> <dbl>
# 1 1 y1.1 -0.9236688
# 2 1 y2.1 -0.2168337
# 3 1 y10.1 0.5041981
tidyverse
包现在允许优雅的解决方案:
library(tidyverse)
N <- 4
df <- data.frame(sample = c(1,1,2,2),
y1.1 = rnorm(N), y2.1 = rnorm(N), y10.1 = rnorm(N))
df %>%
gather("key", "value", -sample, factor_key = T) %>%
group_by(sample, key) %>%
summarise(mean = mean(value))
这导致
# A tibble: 6 x 3
# Groups: sample [2]
sample key mean
<dbl> <fct> <dbl>
1 1 y1.1 0.0894
2 1 y2.1 0.551
3 1 y10.1 0.254
4 2 y1.1 -0.555
5 2 y2.1 -1.36
6 2 y10.1 -0.794
当我从宽格式到长格式收集列时,我试图保留它们的顺序。我遇到的问题是在 gather
和 summarize
之后订单丢失了。列数很大,所以我不想手动输入顺序。
这是一个例子:
library(tidyr)
library(dplyr)
N <- 4
df <- data.frame(sample = c(1,1,2,2),
y1.1 = rnorm(N), y2.1 = rnorm(N), y10.1 = rnorm(N))
> df
sample y1.1 y2.1 y10.1
1 1 1.040938 0.8851727 -0.3617224
2 1 1.175879 1.0009824 -1.1352406
3 2 -1.501832 0.3446469 -1.8687008
4 2 -1.326817 0.4434628 -0.8795962
我想要的是保留列的顺序。在我做了一些操作之后,订单丢失了。看到这里:
dfg <- df %>%
gather(key="key", value="value", -sample) %>%
group_by(sample, key) %>%
summarize(mean = mean(value))
> filter(dfg, sample == 1)
sample key mean
<dbl> <chr> <dbl>
1 1 y1.1 0.2936335
2 1 y10.1 0.6170505
3 1 y2.1 -0.2250543
您可以看到它如何将 y10.1
置于我不想要的 y2.1
之前。我想要的是保留该顺序,参见此处:
dfg <- df %>%
gather(key="key", value="value", -sample)
> filter(dfg, sample == 1)
sample key value
1 1 y1.1 0.60171521
2 1 y1.1 -0.01444823
3 1 y2.1 0.81566726
4 1 y2.1 -1.26577581
5 1 y10.1 0.41686388
6 1 y10.1 0.81723707
出于某种原因,group_by
和 summarize
操作更改了顺序。我不确定为什么。我尝试了 ungroup
命令,但它没有任何作用。正如我之前所说,我的实际数据框有很多列,我需要保留顺序。保留顺序的原因是我可以按正确的顺序绘制数据。
有什么想法吗?
我通过查找 table 找到了可行的解决方案。它似乎对我有用,因为我可以提取列名并将有序编号分配给列名,然后与我的 data.frame
.
解决方法如下:
lookup <- tibble(key = c("y1.1", "y2.1", "y10.1"),
index = c(1,2,3))
> left_join(dfg, lookup, by="key")
# A tibble: 6 x 4
sample key mean index
<dbl> <chr> <dbl> <dbl>
1 1 y1.1 0.2936335 1
2 1 y10.1 0.6170505 3
3 1 y2.1 -0.2250543 2
4 2 y1.1 1.3652070 1
5 2 y10.1 0.9889233 3
6 2 y2.1 0.5216553 2
或者您可以将键列转换为水平反映原始列名称顺序的因子:
df %>%
gather(key="key", value="value", -sample) %>%
mutate(key=factor(key, levels=names(df)[-1])) %>% # add this line to convert the key to a factor
group_by(sample, key) %>%
summarize(mean = mean(value)) %>%
filter(sample == 1)
# A tibble: 3 x 3
# Groups: sample [1]
# sample key mean
# <dbl> <fctr> <dbl>
#1 1 y1.1 0.8310786
#2 1 y2.1 -1.2596933
#3 1 y10.1 0.8208812
另一种方法是 arrange
使用自定义版本的要排序的键列的数据框:
library(dplyr)
library(tidyr)
df %>%
gather(key="key", value="value", -sample) %>%
group_by(sample, key) %>%
summarize(mean = mean(value)) %>%
arrange(as.numeric(stringr::str_replace(key, "y", "")), .by_group = TRUE)
#> # A tibble: 6 x 3
#> # Groups: sample [2]
#> sample key mean
#> <dbl> <chr> <dbl>
#> 1 1 y1.1 0.07001689
#> 2 1 y2.1 1.15349430
#> 3 1 y10.1 1.18266024
#> 4 2 y1.1 0.42616604
#> 5 2 y2.1 1.05891682
#> 6 2 y10.1 -0.12561209
如果您的列确实按其包含的数字排序,这应该有效:
library(readr)
df %>%
gather(key="key", value="value", -sample) %>%
group_by(sample, key) %>%
summarize(mean = mean(value)) %>%
arrange(parse_number(key)) %>% # <- sorting by number contained in key
filter(sample == 1)
# # A tibble: 3 x 3
# # Groups: sample [1]
# sample key mean
# <dbl> <chr> <dbl>
# 1 1 y1.1 -0.9236688
# 2 1 y2.1 -0.2168337
# 3 1 y10.1 0.5041981
tidyverse
包现在允许优雅的解决方案:
library(tidyverse)
N <- 4
df <- data.frame(sample = c(1,1,2,2),
y1.1 = rnorm(N), y2.1 = rnorm(N), y10.1 = rnorm(N))
df %>%
gather("key", "value", -sample, factor_key = T) %>%
group_by(sample, key) %>%
summarise(mean = mean(value))
这导致
# A tibble: 6 x 3
# Groups: sample [2]
sample key mean
<dbl> <fct> <dbl>
1 1 y1.1 0.0894
2 1 y2.1 0.551
3 1 y10.1 0.254
4 2 y1.1 -0.555
5 2 y2.1 -1.36
6 2 y10.1 -0.794