strsplit 中的问题和错误
Problems and error in strsplit
我使用以下命令:
sequence <- '<{EADFE20F543836047330DEFFB893127AF536560121698ADE2FCE6985E07A40D8 SELECT;DD2E595CF23E65E128560B655E0C6848 SELECT}>'
v1 <- trimws(gsub('[[:punct:]]+', '', sapply(strsplit(sequence, '(?<=\})(?=\{)|[[,;]', perl=TRUE), tail, 1)))
我想获取整个字符串,但我只获取了字符串的一部分:
v1
[1] "F73431225ED64969DC4BEBD06092FD6F SELECT"
期望的输出是 <{ }>
之间的字符串内容
我需要做什么才能更改它以获取所有字符串?
此外,
如果我使用而不是数据框的字符串序列列,我会收到以下错误:
Error in strsplit(RES1$sequence, "(?<=\})(?=\{)|[[\,\;]", perl = TRUE) :
non-character argument
这里是 RES1$ 序列的头部:
> head (RES1$sequence)
[1] <{EADFE20F543836047330DEFFB893127AF536560121698ADE2FCE6985E07A40D8 SELECT;DD2E595CF23E65E128560B655E0C6848 SELECT}>
[2] <{F73431225ED64969DC4BEBD06092FD6F SELECT}>
[3] <{88FFF14FDD46ED862DAEB36F8D0F6215 SELECT}>
[4] <{1C9AAE933F916BA94B5D2B5FA320E05D85C780CD1A9922E26BC1FB7C422F42B2 SELECT}>
[5] <{3FCC23C2562BE9926049EAF2D88CD3D4 SELECT;314CD91DCA8849C64DCEACBA2E3B65B7 SELECT;09E9146A444AE1C47B8E4139D6D69A48 SELECT}>
[6] <{184E7C8929FC9CEA72EF21D99CDC40D9 SELECT}>
20 Levels: <{\N}> ... <{F73431225ED64969DC4BEBD06092FD6F SELECT}>
> class (RES1)
[1] "data.frame"
"The desired output is the content of the string between the <{ }>",为什么不简单:
gsub('<\{(.*)\}>', '\1', sequence)
#[1] "EADFE20F543836047330DEFFB893127AF536560121698ADE2FCE6985E07A40D8 SELECT;DD2E595CF23E65E128560B655E0C6848 SELECT"
我使用以下命令:
sequence <- '<{EADFE20F543836047330DEFFB893127AF536560121698ADE2FCE6985E07A40D8 SELECT;DD2E595CF23E65E128560B655E0C6848 SELECT}>'
v1 <- trimws(gsub('[[:punct:]]+', '', sapply(strsplit(sequence, '(?<=\})(?=\{)|[[,;]', perl=TRUE), tail, 1)))
我想获取整个字符串,但我只获取了字符串的一部分:
v1 [1] "F73431225ED64969DC4BEBD06092FD6F SELECT"
期望的输出是 <{ }>
之间的字符串内容我需要做什么才能更改它以获取所有字符串?
此外, 如果我使用而不是数据框的字符串序列列,我会收到以下错误:
Error in strsplit(RES1$sequence, "(?<=\})(?=\{)|[[\,\;]", perl = TRUE) :
non-character argument
这里是 RES1$ 序列的头部:
> head (RES1$sequence)
[1] <{EADFE20F543836047330DEFFB893127AF536560121698ADE2FCE6985E07A40D8 SELECT;DD2E595CF23E65E128560B655E0C6848 SELECT}>
[2] <{F73431225ED64969DC4BEBD06092FD6F SELECT}>
[3] <{88FFF14FDD46ED862DAEB36F8D0F6215 SELECT}>
[4] <{1C9AAE933F916BA94B5D2B5FA320E05D85C780CD1A9922E26BC1FB7C422F42B2 SELECT}>
[5] <{3FCC23C2562BE9926049EAF2D88CD3D4 SELECT;314CD91DCA8849C64DCEACBA2E3B65B7 SELECT;09E9146A444AE1C47B8E4139D6D69A48 SELECT}>
[6] <{184E7C8929FC9CEA72EF21D99CDC40D9 SELECT}>
20 Levels: <{\N}> ... <{F73431225ED64969DC4BEBD06092FD6F SELECT}>
> class (RES1)
[1] "data.frame"
"The desired output is the content of the string between the <{ }>",为什么不简单:
gsub('<\{(.*)\}>', '\1', sequence)
#[1] "EADFE20F543836047330DEFFB893127AF536560121698ADE2FCE6985E07A40D8 SELECT;DD2E595CF23E65E128560B655E0C6848 SELECT"