在 R 中拆分字符串
Splitting strings in R
我有下一行
x<-"CUST_Id_8Name:Mr.Praveen KumarDOB:Mother's Name:Contact Num:Email address:Owns Car:Products held with Bank:Company Name:Salary per. month:Background:"
我想提取 "CUST_Id_8"、"Mr. Praveen Kumar" 以及出生日期之后的所有内容:母亲姓名:联系电话号码:等等存储在变量中,例如客户 ID、姓名、出生日期等。
请帮忙。
我用过
strsplit(x, ":")
但结果是一个包含文本的列表。但是如果变量名后面没有任何内容,我需要空白。
any1可以告诉我如何提取两个单词之间的字符串吗?就像我想在 Name: 和 DOB
之间提取 "Mr. Praveen Kumar"
如果您事先知道密钥,则可以像这样提取值:
keys <- c("CUST_Id_8Name", "DOB", "Mother's Name",
"Contact Num", "Email address", "Owns Car", "Products held with Bank",
"Company Name", "Salary per. month", "Background")
cbind(keys, values = sub("^:", "", strsplit(x, paste0(keys, collapse = "|"))[[1]][-1]))
# keys values
# [1,] "CUST_Id_8Name" "Mr.Praveen Kumar"
# [2,] "DOB" ""
# [3,] "Mother's Name" ""
# [4,] "Contact Num" ""
# [5,] "Email address" ""
# [6,] "Owns Car" ""
# [7,] "Products held with Bank" ""
# [8,] "Company Name" ""
# [9,] "Salary per. month" ""
# [10,] "Background" ""
您可以使用regexec
和regmatches
提取各种数据项作为子字符串。这是一个有效的例子:
示例数据
txt <- c("CUST_Id_8Name:Mr.Praveen KumarDOB:Mother's Name:Contact Num:Email address:Owns Car:Products held with Bank:Company Name:Salary per. month:Background:",
"CUST_Id_15Name:Mr.Joe JohnsonDOB:01/02/1973Mother's Name:BarbaraContact Num:0123 456789Email address:joe@joesville.comOwns Car:YesProducts held with Bank:Savings, CurrentCompany Name:Joes villeSalary per. month:0000Background:shady")
要匹配的模式:
pattern <- "CUST_Id_(.*)Name:(.*)DOB:(.*)Mother's Name:(.*)Contact Num:(.*)Email address:(.*)Owns Car:(.*)Products held with Bank:(.*)Company Name:(.*)Salary per. month:(.*)Background:(.*)"
var_names <- strsplit(pattern, "[:_]\(\.\*\)")[[1]]
运行匹配:
data <- as.data.frame(do.call("rbind", regmatches(txt, regexec(pattern, txt))))[, -1]
colnames(data) <- var_names
输出:
# CUST_Id Name DOB Mother's Name Contact Num
#1 8 Mr.Praveen Kumar
#2 15 Mr.Joe Johnson 01/02/1973 Barbara 0123 456789
# Email address Owns Car Products held with Bank Company Name
#1
#2 joe@joesville.com Yes Savings, Current Joes ville
# Salary per. month Background
#1
#2 0000 shady
我有下一行
x<-"CUST_Id_8Name:Mr.Praveen KumarDOB:Mother's Name:Contact Num:Email address:Owns Car:Products held with Bank:Company Name:Salary per. month:Background:"
我想提取 "CUST_Id_8"、"Mr. Praveen Kumar" 以及出生日期之后的所有内容:母亲姓名:联系电话号码:等等存储在变量中,例如客户 ID、姓名、出生日期等。
请帮忙。
我用过
strsplit(x, ":")
但结果是一个包含文本的列表。但是如果变量名后面没有任何内容,我需要空白。
any1可以告诉我如何提取两个单词之间的字符串吗?就像我想在 Name: 和 DOB
之间提取 "Mr. Praveen Kumar"如果您事先知道密钥,则可以像这样提取值:
keys <- c("CUST_Id_8Name", "DOB", "Mother's Name",
"Contact Num", "Email address", "Owns Car", "Products held with Bank",
"Company Name", "Salary per. month", "Background")
cbind(keys, values = sub("^:", "", strsplit(x, paste0(keys, collapse = "|"))[[1]][-1]))
# keys values
# [1,] "CUST_Id_8Name" "Mr.Praveen Kumar"
# [2,] "DOB" ""
# [3,] "Mother's Name" ""
# [4,] "Contact Num" ""
# [5,] "Email address" ""
# [6,] "Owns Car" ""
# [7,] "Products held with Bank" ""
# [8,] "Company Name" ""
# [9,] "Salary per. month" ""
# [10,] "Background" ""
您可以使用regexec
和regmatches
提取各种数据项作为子字符串。这是一个有效的例子:
示例数据
txt <- c("CUST_Id_8Name:Mr.Praveen KumarDOB:Mother's Name:Contact Num:Email address:Owns Car:Products held with Bank:Company Name:Salary per. month:Background:",
"CUST_Id_15Name:Mr.Joe JohnsonDOB:01/02/1973Mother's Name:BarbaraContact Num:0123 456789Email address:joe@joesville.comOwns Car:YesProducts held with Bank:Savings, CurrentCompany Name:Joes villeSalary per. month:0000Background:shady")
要匹配的模式:
pattern <- "CUST_Id_(.*)Name:(.*)DOB:(.*)Mother's Name:(.*)Contact Num:(.*)Email address:(.*)Owns Car:(.*)Products held with Bank:(.*)Company Name:(.*)Salary per. month:(.*)Background:(.*)"
var_names <- strsplit(pattern, "[:_]\(\.\*\)")[[1]]
运行匹配:
data <- as.data.frame(do.call("rbind", regmatches(txt, regexec(pattern, txt))))[, -1]
colnames(data) <- var_names
输出:
# CUST_Id Name DOB Mother's Name Contact Num
#1 8 Mr.Praveen Kumar
#2 15 Mr.Joe Johnson 01/02/1973 Barbara 0123 456789
# Email address Owns Car Products held with Bank Company Name
#1
#2 joe@joesville.com Yes Savings, Current Joes ville
# Salary per. month Background
#1
#2 0000 shady