strsplit 与非字符数据
strsplit with non-character data
1我想对一个变量 ID1 进行 strsplit 以拆分为 ID1_s1 和 ID1_s2,我需要去掉括号中的字符串。
# dummy data
df1 <- data.frame(ID1=c("Gindalinc","Xaviertechnolgies","anine.inc(Nasq)","Xyzinc"), y=1:4)
strsplit(df1$ID1, "\(")
我如何进行 strplit 以将基于 ID1_s1 和 ID_s2“(”括号的 ID1 分开?
我需要输出如下:
ID1_s1 ID1_s2 y
Gindalinc 1
Xaviertechnolgies 2
anine.inc (Nasq) 3
Xyzinc 4
定义数据框时使用stringsAsFactors = FALSE
(或者如果它已经存在使用df1 <- transform(df1, ID1 = as.character(df1))
并使用指示的模式strsplit
。
df1 <- data.frame(ID1 = c("Gindalinc","Xaviertechnolgies","anine.inc(Nasq)","Xyzinc"),
y = 1:4, stringsAsFactors = FALSE)
s <- strsplit(df1$ID1, "[()]")
给予:
> s
[[1]]
[1] "Gindalinc"
[[2]]
[1] "Xaviertechnolgies"
[[3]]
[1] "anine.inc" "Nasq"
[[4]]
[1] "Xyzinc"
已添加 在更新问题以包括所需的输出之后。使用gsubfn包中的read.pattern
拆分字段如图:
library(gsubfn)
cn <- c("ID1_s1", "ID1_s2")
with(df1, data.frame(read.pattern(text = ID1, pattern = "([^(]*)(.*)", col.names = cn), y))
giving:
ID1_s1 ID1_s2 y
1 Gindalinc 1
2 Xaviertechnolgies 2
3 anine.inc (Nasq) 3
4 Xyzinc 4
已添加 如果输出中出现括号并不重要,那么另一个解决方案是(使用上面代码中的 s
):
data.frame(ID1_s1 = sapply(s, "[", 1), ID1_s2 = sapply(s, "[", 2), y = df1$y)
给予:
ID1_s1 ID1_s2 y
1 Gindalinc <NA> 1
2 Xaviertechnolgies <NA> 2
3 anine.inc Nasq 3
4 Xyzinc <NA> 4
晚安,使用虚拟数据和之前给出的建议,我已经准备(并测试)了下面的这段代码以产生预期的结果。
希望对您处理数据有所帮助
# creating an inicial dataframe
df1 <- data.frame(ID1 = c("Gindalinc","Xaviertechnolgies","anine.inc(Nasq)","Xyzinc"),
y = 1:4, stringsAsFactors = FALSE)
# spliting the element with parenthesis/brackets
y = strsplit(df1$ID1, "[()]")
y
# recreating the parentesis (if needed)
y[[3]][2] = "(Nasq)"
z = c() # creating null vector for loop
# taking the first element from the list and converting it to a column
for (i in 1:4)
z = rbind(z,y[[i]][1])
z2 = c() # creating null vector for loop
# taking the second element from the list and converting it to a column
for (i in 1:4)
z2 = rbind(z2,y[[i]][2])
# recreating the dataframe in the expected way
df1 = data.frame(ID1_s1 = z,ID1_s2 = z2,y = df1$y)
df1
library("tidyr")
df1 <- data.frame(ID1=c("Gindalinc","Xaviertechnolgies","anine.inc(Nasq)","Xyzinc"), y=1:4)
df2 <- separate(df1 , ID1 ,c("ID1_s1" , "ID1_s2") , sep = "(?=\()" , extra = "drop")
# ID1_s1 ID1_s2 y
# 1 Gindalinc <NA> 1
# 2 Xaviertechnolgies <NA> 2
# 3 anine.inc (Nasq) 3
# 4 Xyzinc <NA> 4
# if you want to convert na to ""
df2$ID1_s2[is.na(df2$ID1_s2)] <- ""
# ID1_s1 ID1_s2 y
# 1 Gindalinc 1
# 2 Xaviertechnolgies 2
# 3 anine.inc (Nasq) 3
# 4 Xyzinc 4
1我想对一个变量 ID1 进行 strsplit 以拆分为 ID1_s1 和 ID1_s2,我需要去掉括号中的字符串。
# dummy data
df1 <- data.frame(ID1=c("Gindalinc","Xaviertechnolgies","anine.inc(Nasq)","Xyzinc"), y=1:4)
strsplit(df1$ID1, "\(")
我如何进行 strplit 以将基于 ID1_s1 和 ID_s2“(”括号的 ID1 分开?
我需要输出如下:
ID1_s1 ID1_s2 y
Gindalinc 1
Xaviertechnolgies 2
anine.inc (Nasq) 3
Xyzinc 4
定义数据框时使用stringsAsFactors = FALSE
(或者如果它已经存在使用df1 <- transform(df1, ID1 = as.character(df1))
并使用指示的模式strsplit
。
df1 <- data.frame(ID1 = c("Gindalinc","Xaviertechnolgies","anine.inc(Nasq)","Xyzinc"),
y = 1:4, stringsAsFactors = FALSE)
s <- strsplit(df1$ID1, "[()]")
给予:
> s
[[1]]
[1] "Gindalinc"
[[2]]
[1] "Xaviertechnolgies"
[[3]]
[1] "anine.inc" "Nasq"
[[4]]
[1] "Xyzinc"
已添加 在更新问题以包括所需的输出之后。使用gsubfn包中的read.pattern
拆分字段如图:
library(gsubfn)
cn <- c("ID1_s1", "ID1_s2")
with(df1, data.frame(read.pattern(text = ID1, pattern = "([^(]*)(.*)", col.names = cn), y))
giving:
ID1_s1 ID1_s2 y
1 Gindalinc 1
2 Xaviertechnolgies 2
3 anine.inc (Nasq) 3
4 Xyzinc 4
已添加 如果输出中出现括号并不重要,那么另一个解决方案是(使用上面代码中的 s
):
data.frame(ID1_s1 = sapply(s, "[", 1), ID1_s2 = sapply(s, "[", 2), y = df1$y)
给予:
ID1_s1 ID1_s2 y
1 Gindalinc <NA> 1
2 Xaviertechnolgies <NA> 2
3 anine.inc Nasq 3
4 Xyzinc <NA> 4
晚安,使用虚拟数据和之前给出的建议,我已经准备(并测试)了下面的这段代码以产生预期的结果。
希望对您处理数据有所帮助
# creating an inicial dataframe
df1 <- data.frame(ID1 = c("Gindalinc","Xaviertechnolgies","anine.inc(Nasq)","Xyzinc"),
y = 1:4, stringsAsFactors = FALSE)
# spliting the element with parenthesis/brackets
y = strsplit(df1$ID1, "[()]")
y
# recreating the parentesis (if needed)
y[[3]][2] = "(Nasq)"
z = c() # creating null vector for loop
# taking the first element from the list and converting it to a column
for (i in 1:4)
z = rbind(z,y[[i]][1])
z2 = c() # creating null vector for loop
# taking the second element from the list and converting it to a column
for (i in 1:4)
z2 = rbind(z2,y[[i]][2])
# recreating the dataframe in the expected way
df1 = data.frame(ID1_s1 = z,ID1_s2 = z2,y = df1$y)
df1
library("tidyr")
df1 <- data.frame(ID1=c("Gindalinc","Xaviertechnolgies","anine.inc(Nasq)","Xyzinc"), y=1:4)
df2 <- separate(df1 , ID1 ,c("ID1_s1" , "ID1_s2") , sep = "(?=\()" , extra = "drop")
# ID1_s1 ID1_s2 y
# 1 Gindalinc <NA> 1
# 2 Xaviertechnolgies <NA> 2
# 3 anine.inc (Nasq) 3
# 4 Xyzinc <NA> 4
# if you want to convert na to ""
df2$ID1_s2[is.na(df2$ID1_s2)] <- ""
# ID1_s1 ID1_s2 y
# 1 Gindalinc 1
# 2 Xaviertechnolgies 2
# 3 anine.inc (Nasq) 3
# 4 Xyzinc 4