尝试 return 来自字符向量的数值向量时的 NA 值

NA values when trying to return a numeric vector from character vector

我想将数据框中的一列读入 R 并尝试提取。

AB_lst <- read.csv("tableOut.csv", stringsAsFactors = FALSE)
AB_mass <- AB_lst$StructCalc
AB_mass_numeric <- as.numeric(AB_mass)

我希望 AB_mass_numeric 成为一个数字向量,但是每当我编写上面的代码时,我都会得到

warning message: NAs introduced by coercion

当我执行 head(AB_mass) 时,输出如下所示:

"370.104704 ..." "365.173393 ..." "312.062840 ..." "266.151261 ..." "372.120355 ..." "210.088660 ..."

为什么会发生此错误?我该如何解决才能获得具有这些值的数字类型向量?我认为问题与“...”有关,但我不确定。 AB_lst 的示例如下。

X     CAS.RN                                         Name       Formula          Mass
1 2 28458-24-4                    (+)-Averufanin; Avermutin    C20 H18 O7 370.353 g_mol
2 3 23402-09-7                           (+)-Brevianamide A C21 H23 N3 O3 365.426 g_mol
3 4  1162-65-8 (-)-Aflatoxin-B1; Aflatoxin B; Aflatoxin FB1    C17 H12 O6 312.274 g_mol
4 5 26057-70-5                             (-)-Avenaciolide    C15 H22 O4 266.333 g_mol
5 6  5803-62-3                                (-)-Averantin    C20 H20 O7 372.369 g_mol
6 7 20421-31-2                            (-)-Canadensolide    C11 H14 O4 210.226 g_mol
                                                                                                                        Sources
1                                                                                                    [F] Aspergillus versicolor
2                                                                                [F] Penicillium brevicompactum, P. viridicatum
3                         [F] Aspergillus flavus, A. parasiticus, P.puberulum, P.sp., Asp.sulphureus, P. ostianus; "MunissiMUF2
4 [F] Aspergillus avenaceusIsolation extraction with (EtOAc, 3, filt.) chromatogr. with (Sil-G, ) crystallizat. with (Et2O-Hex)
5                                                                               [L] [F] Aspergillus versicolor; Solorina crocea
6  [F] Penicillium canadense, Aspergillus tamariiIsolation chromatogr. with (Sil-G, Benz-EtOAc) ion exchange with (XAD-2, MeOH)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               C.NMR
1                                                                                                                                                                                                    SIM (187.0 S C2 +-4.3 96*) (181.9 S C8 +-1.5 192*) (164.9 S C18 +-1.4 31*) (164.1 S C10 +-1.443*) (162.8 S C9 +-0.5 11*) (161.0 S C6 +-1.9 26*) (135.9 S C7 +-9.8 217*) (135.9 S C4 +-9.8 217*) (119.7 S C5+-0.6 9*) (118.9 S C1 +-9.8 141*) (109.2 S C3 +-1.2 43*) (108.7 D C12 +-2.3 18*) (108.6 D C15 +-0.6 34*)(108.4 D C14 +-2.0 31*) (74.8 D C11 +-1.9 6*) (74.8 D C22 +-0.2 5*) (32.4 T C26 +-1.2 51*) (28.9 T C24 +-1.08*) (23.3 T C25 +-0.1 5*) (20.9 Q C27 +-1.9 52*)
2                                                                                                                                                                                       SIM (203.4 S C11 +-1.6 Int) (173.7 S C10 +-0.0 1*) (170.3 S C3 +-1.4 Int) (160.5 S C15 +-0.2 8*)(134.7 D C27 +-3.3 98*) (124.6 D C23 +-0.1 9*) (124.5 D C26 +-6.4 338*) (120.2 S C13 +-0.4 8*) (111.8 D C24+-0.2 9*) (69.8 S C5 +-0.0 1*) (69.0 S C2 +-1.6 Int) (65.2 S C1 +-1.6 Int) (55.5 D C7 +-1.7 Int) (44.2 T C19 +-0.01*) (31.2 S C9 +-4.0 Int) (31.1 T C14 +-0.0 1*) (29.7 T C20 +-0.0 1*) (28.7 T C12 +-1.7 Int) (25.4 T C25 +-0.0 1*)(12.5 Q C22 +-9.8 Int) (12.5 Q C21 +-9.8 Int)
3 SIM-EXP (117.0 - 117.0, 1) (176.5 - 176.9, 2) (161.0 - 161.4, 3) (152.5 - 152.2, 4) (103.7 - 104.8, 5)(107.5 - 107.4, 7) (165.3 - 164.5, 8) (113.2 - 113.0, 10) (154.7 - 155.6, 11) (47.8 - 47.7, 12) (90.6 - 90.3, 13)(200.6 - 200.7, 14) (29.0 - 28.9, 15) (144.8 - 145.1, 18) (102.3 - 102.5, 19) (35.0 - 34.9, 20) (56.4 - 55.8, 23) ;SIM-EXP (117.0 - 117.0, 1) (176.9 - 176.5, 2) (161.4 - 161.0, 3) (152.2 - 152.5, 4) (104.9 - 103.7, 5) (107.4 -107.5, 7) (164.5 - 165.3, 8) (113.0 - 113.2, 10) (155.6 - 154.7, 11) (47.7 - 47.8, 12) (90.3 - 90.6, 13) (200.7 -200.6, 14) (28.9 - 29.0, 15) (145.1 - 144.8, 18) (102.5 - 102.3, 19) (34.9 - 35.0, 20) (55.9 - 56.4, 23)
4                                                                                                                                                                                                                                                                                                                SIM (173.4 S C5 +-3.0 Int) (168.3 S C4 +-0.7 Int) (134.0 S C6 +-2.4 Int) (121.8 T C11 +-0.1 2*) (72.9D C8 +-0.6 Int) (71.8 D C1 +-3.8 Int) (40.2 D C2 +-2.1 Int) (30.9 T C15 +-3.4 4300*) (29.2 T C18 +-3.0 4236*)(28.8 T C17 +-3.4 27891*) (28.5 T C16 +-3.1 1809*) (27.1 T C13 +-3.8 3*) (24.9 T C12 +-1.1 Int) (23.9 T C14 +-3.7 5242*) (14.7 Q C19 +-4.0 6903*)
5                                                                                                                                                                                              SIM (187.0 S C2 +-4.3 96*) (181.9 S C7 +-1.5 192*) (164.9 S C16 +-1.4 31*) (164.1 S C10 +-1.443*) (162.8 S C9 +-0.5 11*) (161.0 S C6 +-1.9 26*) (135.9 S C5 +-9.8 217*) (135.9 S C4 +-9.8 217*) (121.7 S C8+-0.0 1*) (118.9 S C1 +-9.8 141*) (109.2 S C3 +-1.2 43*) (108.7 D C11 +-2.3 18*) (108.6 D C13 +-0.6 34*)(108.4 D C12 +-2.0 31*) (67.8 D C17 +-0.0 1*) (31.9 T C23 +-3.5 8*) (31.0 T C26 +-2.6 430*) (25.7 T C24 +-0.35*) (23.9 T C25 +-3.7 5242*) (14.7 Q C27 +-4.0 6903*)
6                                                                                                                                                                                                                                                                                                                                                                                                                     SIM (171.5 S C2 +-1.0 Int) (170.0 S C5 +-1.5 Int) (133.8 S C6 +-0.7 Int) (124.0 T C11 +-0.0 Int) (79.3D C8 +-1.1 Int) (74.0 D C3 +-1.4 Int) (47.3 D C1 +-7.9 Int) (30.2 T C12 +-0.3 Int) (27.1 T C13 +-0.1 4*) (23.1 TC14 +-4.4 148*) (14.7 Q C15 +-4.0 6903*)
                     C.NMR.Struct
1                   simulated ...
2                   simulated ...
3 simulated ...; experimental ...
4                   simulated ...
5                   simulated ...
6                   simulated ...
                                                                                                                             H.NMR
1                                                                                                                                 
2                                                                                                                                 
3 CDCl3: (2.56, H4) (3.34, H5) (6.38, H9) (6.75, J=7.0, H13) (4.72, J=7.0, 3.0, H14) (5.42, J=3.0,3.0,H15) (6.40, H16) (3.93, H17)
4                                                                                                                                 
5                                                                                                                           [3513]
6                                                                                                                                 
                                                       MS.Spectra UV.A UV.B
1                                                                          
2                                                                          
3 (312, 100%, M+) (284) (269) (256) (241) (227) (199) (185) (171)          
4                                                                          
5                                                                          
6                                                                          
                                                                                         UV.N
1                                                                                            
2                                                                                            
3 MeOH: (220, 25600) (265, 13400) (362, 21800) (EtOH): (223, 25600) (265, 13400) (362, 21800)
4                                                                          MeOH: (210, 10000)
5                                                                                            
6                                                                          MeOH: (210, 1OOOO)
                                       UV
1                                        
2                                        
3 220 265 362 ...; 223 265 362 ...; light
4                                 210 ...
5                                        
6                                 210 ...
                                                                   IR.Spectra
1                                                                            
2                                                                            
3 KBr (1754) (1701) (1615) (1595) (1429) (1356) (1229) (1130) (977) (824) ...
4                                                                            
5                                                                            
6                                                                            
                       Toxicity                            Solubility
1                                                                    
2                                                                    
3 LD50 = (1, peros) hepatotoxic      good in MeOH, Chl, hardly in Hex
4                                     good in MeOH, Et2O, hardly in W
5                                                                    
6                               good in EtOAc, Chl, hardly in W, base
                                                                 Activity
1                                                                        
2                                                                        
3 (B.subt., 15) (S.aureus, ) (Mycob.sp., ) (Fungi, 10) (Nocardia sp., 20)
4                              (B.subt., 200) (Phyt.fungi, 1)(antibiotic)
5                                                (bacteria, +) (fungi, -)
6                                                (Phyt.fungi, ) (Fungi, )
                                               Appearance    MeltingP                    TLC
1                                                                -271                       
2                                                         (175)-(180)                       
3     fluorescence emission 425 nm; white, yellow, cryst.   (268-269)                       
4 (+)-form; also (-)-form, (+-)- form found white, cryst.      (54-6)                       
5                                                             (233-4) (0.48, EtOAc_cHex 1:1)
6                                           white, cryst.    (46-7.5)                       
      StructCalc                     Group
1 370.104704 ...                          
2 365.173393 ...                          
3 312.062840 ...        aflatoxin, neutral
4 266.151261 ... dilactone deriv., neutral
5 372.120355 ...                          
6 210.088660 ... dilactone deriv., neutral
                                                                      Remarks
1                                                   *C,H also (+-)-form found
2                                                                            
3 *C (see H),H,I,M (see I),U EXP = 2nd val in CDCl3: C-OMe_COO were exchanged
4                                                                            
5                                                                        *H,M
6                                                              also (+-)-form
                                                                                                                                                                                                                                                                                                                                        References
1                                                                                                                                                                                                                                                                     Thomson II, 487; Horak, R. et al., J. Chem. Soc., Perkin Trans. 1 (1985) 345
2                                                                                                                                                                                                                                                                                   Williams, R. M. et al., J. Am. Chem. Soc., 111(8), 3064-5 1989
3 Cole_Cox, 15; Nature,192,1096,1961; 198,1056,1963; Endeavour 22,75,1963; JACS,85,1706, 1963;87, 882, 1965; Forsch., 31, 118, 1974; Exp., 23,187,1967; J. Bact.,93,59,1967; Appl. Micr.,14,403,1966; Z. Allg.Mikr.,12,593,1972; Bioch.J., 114,289,1969; Bact. Rev.,41, 822,1977;30,460,1966; CA,89,36786;CR Ser. D,285,201, 1978; AAC,16,277,1979
4                                                                                                                                                                                                      JCS,5385,1963; Nature,203,1382,1964; JACS,91,7208,1969;95, 7923,1973;97,3870,1975;JOC,38,2489,1973; CC,538,1973; Aust. J. Chem.,18,373,1965
5                                                                                                                                                                                                                                               Thomson II, 483; Townsend, Craig A., Tetrahedron Lett. 1986, 27(8), 887-8; Turner II, 187,188, 191
6                                                                                                                                                                                                                                                                          TL,727,1968,3233,1978; Tsuboi, S. et al., J. Org. Chem., 51 (1986) 4944
                                 CA                                REG
1                          DA:A-915             28458-24-4; 73346-80-2
2  DA:B-138; 120:186930; 110:189072                         23402-09-7
3 DA:A-096; 108:218732k; 114:97857t                          1162-65-8
4                          DA:A-904 26057-70-5; 16993-42-3; 20223-76-1
5             DA:A-905; 105:133600d                          5803-62-3
6                          DA:C-013                         20421-31-2
                                                                                        ChemClass
1  no charge; oxygen heterocycle; carbocycle; aromatic; alicycle; large ring; fused rings; 6ring;
2       no charge; nitrogen heterocycle; carbocycle; aromatic; alicycle; large ring; fused rings;
3  no charge; oxygen heterocycle; carbocycle; aromatic; alicycle; large ring; fused rings; 5ring;
4 no charge; oxygen heterocycle; alicycle; large ring; fused rings; 5ring; 8ring; ester; lactone;
5        no charge; carbocycle; aromatic; large ring; fused rings; 6ring; 10ring; 14ring; ketone;
6 no charge; oxygen heterocycle; alicycle; large ring; fused rings; 5ring; 8ring; ester; lactone;
                                               Opt.Rot X.1
1                                                         
2                                    aD25: (+413 EtOH)    
3 (-480, DMF) (-559, Chl); aD25 (-562, c=0.115, CHCL3)    
4                        aD25:(-41.6, Chl) (-41, EtOH)    
5                            aD22 (-178, c 0.37, EtOH)    
6                                       aD:(-141, Chl) 

提前致谢

看起来问题出在三个点上。您可以使用以下方法清理它:

a = "1 ..."
as.numeric(a)
# Doesn't work #

b = gsub("[.]", "", a)
as.numeric(b)
# works #

如果您从字符向量 Struct Calc 中删除尾随句点,您应该会成功:

StructCalc <- as.numeric( gsub("[ ][.]+", "", StructCalc) )

如果你去掉所有的句点,那么你就失去了 "decimal place"。

> sc <- scan(what="",sep=",")
1: 370.104704 ...
2: 365.173393 ...
3: 312.062840 ...
4: 266.151261 ...
5: 372.120355 ...
6: 210.088660 ...
7: 
Read 6 items
> sub("[ ][.]+","",sc)
[1] "370.104704" "365.173393" "312.062840" "266.151261" "372.120355" "210.088660"
> as.numeric(sub("[ ][.]+","",sc))
[1] 370.1047 365.1734 312.0628 266.1513 372.1204 210.0887

> print( as.numeric(sub("[ ][.]+","",sc)), digits=16)
[1] 370.104704 365.173393 312.062840 266.151261 372.120355 210.088660