Rtable以原子数,concatenate/paste为分子式
R table with number of atoms, concatenate/paste to molecular formula
我有一个如下所示的 csv 数据文件:
> head(df)
# A tibble: 6 x 6
Name C H N O S
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 'Alanine' 3 7 1 2 0
2 'Arginine' 6 14 4 2 0
3 'Cysteine' 3 7 1 2 1
4 'Sucrose' 12 22 0 11 0
5 'Fructose' 6 12 0 6 0
6 'Ribose' 5 10 0 5 0
我想将所有这些不同的列粘贴到一个列中,以便每一行都有一个分子式。我最初尝试通过简单地粘贴每一列的值来做到这一点:
> for (i in c(1:nrow(df)) {
df$formula[i] <- paste0("C", df$C[i], "H", df$H[i], "N", df$N[i],
"O", df$O[i], "S", df$S[i]) }
如果第 C
到 S
列中没有零,则此方法有效,但如果列中有多个零,它将像下面那样粘贴零,但我想有没有零的分子式。
> head(df$formula)
[1] "C3H7N1O2S0" "C6H14N4O2S0" "C3H7N1O2S1" "C12H22N0O11S0" "C6H12N0O6S0" "C5H10N0O5S0"
# what I want instead: "C3H7N1O2" "C6H14N4O2" "C3H7N1O2S1" "C12H22O11" "C6H12O6" "C5H10O5"
只有当值不为零时,是否有任何其他方法可以粘贴这些列?或者像这样粘贴列,然后用零删除表达式的部分会更容易吗?
我认为删除 0
部分是最简单的方法:
library(dplyr)
df %>%
mutate(formula = gsub("[CHNOS]0", "", paste0("C", C, "H", H, "N", N, "O", O, "S", S)))
returns
# A tibble: 6 x 7
Name C H N O S formula
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 'Alanine' 3 7 1 2 0 C3H7N1O2
2 'Arginine' 6 14 4 2 0 C6H14N4O2
3 'Cysteine' 3 7 1 2 1 C3H7N1O2S1
4 'Sucrose' 12 22 0 11 0 C12H22O11
5 'Fructose' 6 12 0 6 0 C6H12O6
6 'Ribose' 5 10 0 5 0 C5H10O5
更通用的方法可能是
library(tidyr)
library(dplyr)
df %>%
pivot_longer(-Name) %>%
filter(value > 0) %>%
group_by(Name) %>%
summarise(formula = paste(name, value, collapse = "", sep = "")) %>%
right_join(df, by = "Name")
回归
# A tibble: 6 x 7
Name formula C H N O S
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 'Alanine' C3H7N1O2 3 7 1 2 0
2 'Arginine' C6H14N4O2 6 14 4 2 0
3 'Cysteine' C3H7N1O2S1 3 7 1 2 1
4 'Fructose' C6H12O6 6 12 0 6 0
5 'Ribose' C5H10O5 5 10 0 5 0
6 'Sucrose' C12H22O11 12 22 0 11 0
paste0
名称和值以 rbind
交错的方式组合在一起,然后从输出中删除 0
之前的任何 non-numeric 个字符 \D
=],以及 0
:
vars <- c("C","H","N","O","S")
gsub("\D+0", "", do.call(paste0, rbind(names(dat[vars]), as.list(dat[vars]))))
##[1] "C3H7N1O2" "C6H14N4O2" "C3H7N1O2S1" "C12H22O11" "C6H12O6" "C5H10O5"
之所以有效,是因为 rbind
按列顺序创建的交替名称,然后是列表,名称,然后是列表...:
rbind(names(dat[vars]), as.list(dat[vars]))
## C H N O S
##[1,] "C" "H" "N" "O" "S"
##[2,] integer,6 integer,6 integer,6 integer,6 integer,6
其中 dat
是:
dat <- read.table(text="
Name C H N O S
Alanine 3 7 1 2 0
Arginine 6 14 4 2 0
Cysteine 3 7 1 2 1
Sucrose 12 22 0 11 0
Fructose 6 12 0 6 0
Ribose 5 10 0 5 0
", header=TRUE)
我有一个如下所示的 csv 数据文件:
> head(df)
# A tibble: 6 x 6
Name C H N O S
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 'Alanine' 3 7 1 2 0
2 'Arginine' 6 14 4 2 0
3 'Cysteine' 3 7 1 2 1
4 'Sucrose' 12 22 0 11 0
5 'Fructose' 6 12 0 6 0
6 'Ribose' 5 10 0 5 0
我想将所有这些不同的列粘贴到一个列中,以便每一行都有一个分子式。我最初尝试通过简单地粘贴每一列的值来做到这一点:
> for (i in c(1:nrow(df)) {
df$formula[i] <- paste0("C", df$C[i], "H", df$H[i], "N", df$N[i],
"O", df$O[i], "S", df$S[i]) }
如果第 C
到 S
列中没有零,则此方法有效,但如果列中有多个零,它将像下面那样粘贴零,但我想有没有零的分子式。
> head(df$formula)
[1] "C3H7N1O2S0" "C6H14N4O2S0" "C3H7N1O2S1" "C12H22N0O11S0" "C6H12N0O6S0" "C5H10N0O5S0"
# what I want instead: "C3H7N1O2" "C6H14N4O2" "C3H7N1O2S1" "C12H22O11" "C6H12O6" "C5H10O5"
只有当值不为零时,是否有任何其他方法可以粘贴这些列?或者像这样粘贴列,然后用零删除表达式的部分会更容易吗?
我认为删除 0
部分是最简单的方法:
library(dplyr)
df %>%
mutate(formula = gsub("[CHNOS]0", "", paste0("C", C, "H", H, "N", N, "O", O, "S", S)))
returns
# A tibble: 6 x 7
Name C H N O S formula
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 'Alanine' 3 7 1 2 0 C3H7N1O2
2 'Arginine' 6 14 4 2 0 C6H14N4O2
3 'Cysteine' 3 7 1 2 1 C3H7N1O2S1
4 'Sucrose' 12 22 0 11 0 C12H22O11
5 'Fructose' 6 12 0 6 0 C6H12O6
6 'Ribose' 5 10 0 5 0 C5H10O5
更通用的方法可能是
library(tidyr)
library(dplyr)
df %>%
pivot_longer(-Name) %>%
filter(value > 0) %>%
group_by(Name) %>%
summarise(formula = paste(name, value, collapse = "", sep = "")) %>%
right_join(df, by = "Name")
回归
# A tibble: 6 x 7
Name formula C H N O S
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 'Alanine' C3H7N1O2 3 7 1 2 0
2 'Arginine' C6H14N4O2 6 14 4 2 0
3 'Cysteine' C3H7N1O2S1 3 7 1 2 1
4 'Fructose' C6H12O6 6 12 0 6 0
5 'Ribose' C5H10O5 5 10 0 5 0
6 'Sucrose' C12H22O11 12 22 0 11 0
paste0
名称和值以 rbind
交错的方式组合在一起,然后从输出中删除 0
之前的任何 non-numeric 个字符 \D
=],以及 0
:
vars <- c("C","H","N","O","S")
gsub("\D+0", "", do.call(paste0, rbind(names(dat[vars]), as.list(dat[vars]))))
##[1] "C3H7N1O2" "C6H14N4O2" "C3H7N1O2S1" "C12H22O11" "C6H12O6" "C5H10O5"
之所以有效,是因为 rbind
按列顺序创建的交替名称,然后是列表,名称,然后是列表...:
rbind(names(dat[vars]), as.list(dat[vars]))
## C H N O S
##[1,] "C" "H" "N" "O" "S"
##[2,] integer,6 integer,6 integer,6 integer,6 integer,6
其中 dat
是:
dat <- read.table(text="
Name C H N O S
Alanine 3 7 1 2 0
Arginine 6 14 4 2 0
Cysteine 3 7 1 2 1
Sucrose 12 22 0 11 0
Fructose 6 12 0 6 0
Ribose 5 10 0 5 0
", header=TRUE)