条件分裂单细胞

Conditionally split single cells

我有这个 data.frame,我想确定 sample1$domain 中哪些单元格有 "www",将其替换为 ""strsplit 相应的 sample1$suffix。数据如下所示:

              domain         suffix
1              wbx2            com
2            redhat            com
3          something           com
4           gstatic            com
5               www googleapis.com
6       smartfilter            com

我已经设法解决了这个问题,如下所示,但它改变了行的位置(我希望它保持在位置 5)并且考虑到它会 运行 数百万个案例,我认为这不是最有效的方法。:

library("stringr")
sample1$domain <- ifelse(sample1$domain == "www", "", sample1$domain)
sample1[sample1$domain == "", c("domain", "suffix")] <- sample1[sample1$domain == "", c("suffix", "domain")]
y <- sample1$domain[sample1$suffix == ""]
z <- as.data.frame(unlist(str_split_fixed(y, "[.]", 2)))
colnames(z) <- c("domain", "suffix")
sample1 <- rbind(sample1, z)
sample1 <- subset(sample1, sample1$suffix != "")
rownames(sample1) <- NULL
sample1 
#             domain suffix
#1              wbx2    com
#2            redhat    com
#3         something    com
#4           gstatic    com
#5       smartfilter    com
#6        googleapis    com

数据

sample1 <- structure(list(domain = c("wbx2", "redhat", "something", 
"gstatic", "www", "smartfilter"), suffix = c("com", "com", "com", 
"com", "googleapis.com", "com")), .Names = c("domain", "suffix"
), row.names = c(NA, 6L), class = "data.frame")

我们可以为 "www" 的值创建索引。然后使用该索引替换站点名称,最后替换站点后缀:

ind <- sample1$domain == "www"
sample1$domain[ind] <- sub("^(.*)\..*", "\1", sample1$suffix[ind])
sample1$suffix[ind] <- sub(".*\.(.*)", "\1", sample1$suffix[ind])
sample1
#        domain suffix
# 1        wbx2    com
# 2      redhat    com
# 3   something    com
# 4     gstatic    com
# 5  googleapis    com
# 6 smartfilter    com