Rtable以原子数,concatenate/paste为分子式

R table with number of atoms, concatenate/paste to molecular formula

我有一个如下所示的 csv 数据文件:

> head(df)
# A tibble: 6 x 6
  Name                        C     H     N     O     S
  <chr>                     <dbl> <dbl> <dbl> <dbl> <dbl>
1 'Alanine'                   3     7     1     2     0         
2 'Arginine'                  6     14    4     2     0     
3 'Cysteine'                  3     7     1     2     1
4 'Sucrose'                   12    22    0     11    0
5 'Fructose'                  6     12    0     6     0  
6 'Ribose'                    5     10    0     5     0  

我想将所有这些不同的列粘贴到一个列中,以便每一行都有一个分子式。我最初尝试通过简单地粘贴每一列的值来做到这一点:

> for (i in c(1:nrow(df)) {
df$formula[i] <- paste0("C", df$C[i], "H", df$H[i], "N", df$N[i],
                        "O", df$O[i], "S", df$S[i]) }

如果第 CS 列中没有零,则此方法有效,但如果列中有多个零,它将像下面那样粘贴零,但我想有没有零的分子式。

> head(df$formula)
[1] "C3H7N1O2S0"  "C6H14N4O2S0"  "C3H7N1O2S1"  "C12H22N0O11S0"  "C6H12N0O6S0"  "C5H10N0O5S0"
# what I want instead: "C3H7N1O2"  "C6H14N4O2"  "C3H7N1O2S1"  "C12H22O11"  "C6H12O6"  "C5H10O5"

只有当值不为零时,是否有任何其他方法可以粘贴这些列?或者像这样粘贴列,然后用零删除表达式的部分会更容易吗?

我认为删除 0 部分是最简单的方法:

library(dplyr)

df %>% 
  mutate(formula = gsub("[CHNOS]0", "", paste0("C", C, "H", H, "N", N, "O", O, "S", S)))

returns

# A tibble: 6 x 7
  Name           C     H     N     O     S formula   
  <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <chr>     
1 'Alanine'      3     7     1     2     0 C3H7N1O2  
2 'Arginine'     6    14     4     2     0 C6H14N4O2 
3 'Cysteine'     3     7     1     2     1 C3H7N1O2S1
4 'Sucrose'     12    22     0    11     0 C12H22O11 
5 'Fructose'     6    12     0     6     0 C6H12O6   
6 'Ribose'       5    10     0     5     0 C5H10O5   

更通用的方法可能是

library(tidyr)
library(dplyr)

df %>% 
  pivot_longer(-Name) %>% 
  filter(value > 0) %>% 
  group_by(Name) %>% 
  summarise(formula = paste(name, value, collapse = "", sep = "")) %>% 
  right_join(df, by = "Name")

回归

# A tibble: 6 x 7
  Name       formula        C     H     N     O     S
  <chr>      <chr>      <dbl> <dbl> <dbl> <dbl> <dbl>
1 'Alanine'  C3H7N1O2       3     7     1     2     0
2 'Arginine' C6H14N4O2      6    14     4     2     0
3 'Cysteine' C3H7N1O2S1     3     7     1     2     1
4 'Fructose' C6H12O6        6    12     0     6     0
5 'Ribose'   C5H10O5        5    10     0     5     0
6 'Sucrose'  C12H22O11     12    22     0    11     0

paste0 名称和值以 rbind 交错的方式组合在一起,然后从输出中删除 0 之前的任何 non-numeric 个字符 \D =],以及 0:

vars <- c("C","H","N","O","S")
gsub("\D+0", "", do.call(paste0, rbind(names(dat[vars]), as.list(dat[vars]))))
##[1] "C3H7N1O2"   "C6H14N4O2"  "C3H7N1O2S1" "C12H22O11"  "C6H12O6"    "C5H10O5"

之所以有效,是因为 rbind 按列顺序创建的交替名称,然后是列表,名称,然后是列表...:

rbind(names(dat[vars]), as.list(dat[vars]))
##     C         H         N         O         S        
##[1,] "C"       "H"       "N"       "O"       "S"      
##[2,] integer,6 integer,6 integer,6 integer,6 integer,6

其中 dat 是:

dat <- read.table(text="
Name        C     H     N     O     S
Alanine     3     7     1     2     0         
Arginine    6    14     4     2     0     
Cysteine    3     7     1     2     1
Sucrose    12    22     0    11     0
Fructose    6    12     0     6     0  
Ribose      5    10     0     5     0  
", header=TRUE)