如何从 spotfire 中的长字符串中提取包含特定字母组合的单词?

How to Extract the words containing a specific combination of letters from a long string in spotfire?

例如下面的字符串,

abc6:ContextData abc6:xyz1 iCare abc6:xyz2 abc6:xyz3  abc6:xyz4 <abc6:xyz5  abc6:xyz6  abc6:xyz7 abc6:ContextData

我想提取以“abc6”开头的单词。对于“abc6:xyz3”,我想要后缀 xyz3。对于更长的示例,输出将类似于:

ContextData,xyz1,xyz2,xyz3,xyz4,xyz5,xyz6,xyz7,ContextData

我们需要这些正则表达式吗?

您的 post 被标记为 rpython

这两种语言

r中,您可以使用gsub()将模式“abc6:”替换为空字符串。

在python中,可以按如下方式实现gsub

import re

def gsub(old, new, search_space):
    return re.sub(old, new, search_space) 

用空字符串替换abc6:

z = "abc6:ContextData abc6:xyz1 iCare abc6:xyz2 abc6:xyz3 abc6:xyz4"
z2 = gsub("abc6:","",z)
> z2
[1] "ContextData xyz1 iCare xyz2 xyz3 xyz4"

如果您想要逗号而不是空格,则可以使用

z3 = gsub(" ",",",z2)
> z3
[1] "ContextData,xyz1,iCare,xyz2,xyz3,xyz4"

或者,如果您正在寻找矢量,

> strsplit(z2," ")[[1]]
[1] "ContextData" "xyz1"        "iCare"       "xyz2"        "xyz3"        "xyz4"

基于substr函数的R-base解决方案是:

z  <- "abc6:ContextData abc6:xyz1 iCare abc6:xyz2 abc6:xyz3 abc6:xyz4 <abc6:xyz5 abc6:xyz6 abc6:xyz7 abc6:ContextData"
z1 <- unlist(strsplit(z, split=" "))
z2 <- z1[substr(z1, start=1, stop=5)=="abc6:"]
z3 <- substr(z2, start=6, stop=nchar(z2))
cat(z3, sep=",")

结果:

ContextData,xyz1,xyz2,xyz3,xyz4,xyz6,xyz7,ContextData