创建一个新行,其中 1 - R 中数据框中列值的总和
Create a new row with 1 - sum of column values in a data frame in R
我有一个不同样本中物种相对丰度 (%) 的多变量数据集。在这个数据框中,我只有最丰富的物种,所以总数不是 100%。
我的数据集看起来像这样,但有更多的物种和样本:
species = c("Species1","Species2","Species3","Species4","Species5","Species6")
Sample1 = c(0.6,7.9,7.1,2.7,4.5,6.4)
Sample2 = c(1.8,0.3,0.9,3.3,1.7,9.8)
Sample3 = c(9.2,1,8,2.1,8,2.2)
Sample4 = c(6.1,1.3,9,5.3,5.5,6.2)
df = data.frame(species, Sample1, Sample2, Sample3, Sample4)
df
species Sample1 Sample2 Sample3 Sample4
1 Species1 0.6 1.8 9.2 6.1
2 Species2 7.9 0.3 1.0 1.3
3 Species3 7.1 0.9 8.0 9.0
4 Species4 2.7 3.3 2.1 5.3
5 Species5 4.5 1.7 8.0 5.5
6 Species6 6.4 9.8 2.2 6.2
但我想制作一个堆积条形图,其中我还有变量“其他”,代表所有最稀有物种的覆盖百分比,计算公式为 100 - sum of column
我想要的结果是这样的:
species Sample1 Sample2 Sample3 Sample4
1 Species1 0.6 1.8 9.2 6.1
2 Species2 7.9 0.3 1.0 1.3
3 Species3 7.1 0.9 8.0 9.0
4 Species4 2.7 3.3 2.1 5.3
5 Species5 4.5 1.7 8.0 5.5
6 Species6 6.4 9.8 2.2 6.2
7 Others 70.8 82.2 69.5 66.6
我该怎么办?我已经搜索了几个小时,但找不到解决方案。
要获取您需要的数据,请使用 summarize(across())
bind_rows(
df,
df %>% summarize(across(starts_with("Sample"),~100-sum(.x))) %>%
mutate(species="Others")
)
输出:
species Sample1 Sample2 Sample3 Sample4
1 Species1 0.6 1.8 9.2 6.1
2 Species2 7.9 0.3 1.0 1.3
3 Species3 7.1 0.9 8.0 9.0
4 Species4 2.7 3.3 2.1 5.3
5 Species5 4.5 1.7 8.0 5.5
6 Species6 6.4 9.8 2.2 6.2
7 Others 70.8 82.2 69.5 66.6
此外,如果您想将其绘制在简单的堆积条形图中,您可以使用以下方法继续管道:
... %>% pivot_longer(cols = -species, names_to="Sample",values_to = "Abundance") %>%
ggplot(aes(Sample,Abundance,fill=species)) +
geom_col() +
labs(fill="", y="Relative Abundance")+
theme(legend.position = "bottom")
这个答案和另一个答案之间的主要区别在于 data.table
:
的使用
library(data.table)
library(ggplot2)
library(RColorBrewer)
#
setDT(df)
result <- rbind(df, df[, c(species='Others', lapply(.SD, \(x) 100-sum(x))), .SDcols=-1])
result
## species Sample1 Sample2 Sample3 Sample4
## 1: Species1 0.6 1.8 9.2 6.1
## 2: Species2 7.9 0.3 1.0 1.3
## 3: Species3 7.1 0.9 8.0 9.0
## 4: Species4 2.7 3.3 2.1 5.3
## 5: Species5 4.5 1.7 8.0 5.5
## 6: Species6 6.4 9.8 2.2 6.2
## 7: Others 70.8 82.2 69.5 66.6
.SDcols = -1
表示使用除第一列以外的所有列。
# melt for use in ggplot
# reorder factors to put "Others" at the top.
#
gg.dt <- melt(result, id='species')[
, species:=factor(species, levels=c('Others', setdiff(unique(species), 'Others')))]
##
# use Brewer palette, with grey80 for "Others"
#
ggplot(gg.dt, aes(x=variable, y=value, fill=species))+
geom_bar(stat='identity', color='grey80')+
scale_fill_manual(values = c('grey80', brewer.pal(6, 'Spectral')))+
labs(x=NULL, y='Relative Abundance')
另一种可能的解决方案,基于 dplyr
和 colSums
:
library(dplyr)
df %>%
bind_rows(data.frame(species = "Others", t(100 - colSums(.[-1]))))
#> species Sample1 Sample2 Sample3 Sample4
#> 1 Species1 0.6 1.8 9.2 6.1
#> 2 Species2 7.9 0.3 1.0 1.3
#> 3 Species3 7.1 0.9 8.0 9.0
#> 4 Species4 2.7 3.3 2.1 5.3
#> 5 Species5 4.5 1.7 8.0 5.5
#> 6 Species6 6.4 9.8 2.2 6.2
#> 7 Others 70.8 82.2 69.5 66.6
我有一个不同样本中物种相对丰度 (%) 的多变量数据集。在这个数据框中,我只有最丰富的物种,所以总数不是 100%。
我的数据集看起来像这样,但有更多的物种和样本:
species = c("Species1","Species2","Species3","Species4","Species5","Species6")
Sample1 = c(0.6,7.9,7.1,2.7,4.5,6.4)
Sample2 = c(1.8,0.3,0.9,3.3,1.7,9.8)
Sample3 = c(9.2,1,8,2.1,8,2.2)
Sample4 = c(6.1,1.3,9,5.3,5.5,6.2)
df = data.frame(species, Sample1, Sample2, Sample3, Sample4)
df
species Sample1 Sample2 Sample3 Sample4
1 Species1 0.6 1.8 9.2 6.1
2 Species2 7.9 0.3 1.0 1.3
3 Species3 7.1 0.9 8.0 9.0
4 Species4 2.7 3.3 2.1 5.3
5 Species5 4.5 1.7 8.0 5.5
6 Species6 6.4 9.8 2.2 6.2
但我想制作一个堆积条形图,其中我还有变量“其他”,代表所有最稀有物种的覆盖百分比,计算公式为 100 - sum of column
我想要的结果是这样的:
species Sample1 Sample2 Sample3 Sample4
1 Species1 0.6 1.8 9.2 6.1
2 Species2 7.9 0.3 1.0 1.3
3 Species3 7.1 0.9 8.0 9.0
4 Species4 2.7 3.3 2.1 5.3
5 Species5 4.5 1.7 8.0 5.5
6 Species6 6.4 9.8 2.2 6.2
7 Others 70.8 82.2 69.5 66.6
我该怎么办?我已经搜索了几个小时,但找不到解决方案。
要获取您需要的数据,请使用 summarize(across())
bind_rows(
df,
df %>% summarize(across(starts_with("Sample"),~100-sum(.x))) %>%
mutate(species="Others")
)
输出:
species Sample1 Sample2 Sample3 Sample4
1 Species1 0.6 1.8 9.2 6.1
2 Species2 7.9 0.3 1.0 1.3
3 Species3 7.1 0.9 8.0 9.0
4 Species4 2.7 3.3 2.1 5.3
5 Species5 4.5 1.7 8.0 5.5
6 Species6 6.4 9.8 2.2 6.2
7 Others 70.8 82.2 69.5 66.6
此外,如果您想将其绘制在简单的堆积条形图中,您可以使用以下方法继续管道:
... %>% pivot_longer(cols = -species, names_to="Sample",values_to = "Abundance") %>%
ggplot(aes(Sample,Abundance,fill=species)) +
geom_col() +
labs(fill="", y="Relative Abundance")+
theme(legend.position = "bottom")
这个答案和另一个答案之间的主要区别在于 data.table
:
library(data.table)
library(ggplot2)
library(RColorBrewer)
#
setDT(df)
result <- rbind(df, df[, c(species='Others', lapply(.SD, \(x) 100-sum(x))), .SDcols=-1])
result
## species Sample1 Sample2 Sample3 Sample4
## 1: Species1 0.6 1.8 9.2 6.1
## 2: Species2 7.9 0.3 1.0 1.3
## 3: Species3 7.1 0.9 8.0 9.0
## 4: Species4 2.7 3.3 2.1 5.3
## 5: Species5 4.5 1.7 8.0 5.5
## 6: Species6 6.4 9.8 2.2 6.2
## 7: Others 70.8 82.2 69.5 66.6
.SDcols = -1
表示使用除第一列以外的所有列。
# melt for use in ggplot
# reorder factors to put "Others" at the top.
#
gg.dt <- melt(result, id='species')[
, species:=factor(species, levels=c('Others', setdiff(unique(species), 'Others')))]
##
# use Brewer palette, with grey80 for "Others"
#
ggplot(gg.dt, aes(x=variable, y=value, fill=species))+
geom_bar(stat='identity', color='grey80')+
scale_fill_manual(values = c('grey80', brewer.pal(6, 'Spectral')))+
labs(x=NULL, y='Relative Abundance')
另一种可能的解决方案,基于 dplyr
和 colSums
:
library(dplyr)
df %>%
bind_rows(data.frame(species = "Others", t(100 - colSums(.[-1]))))
#> species Sample1 Sample2 Sample3 Sample4
#> 1 Species1 0.6 1.8 9.2 6.1
#> 2 Species2 7.9 0.3 1.0 1.3
#> 3 Species3 7.1 0.9 8.0 9.0
#> 4 Species4 2.7 3.3 2.1 5.3
#> 5 Species5 4.5 1.7 8.0 5.5
#> 6 Species6 6.4 9.8 2.2 6.2
#> 7 Others 70.8 82.2 69.5 66.6