矩阵中的R列表到数据帧转换
R list within matrix to dataframe conversion
R挣扎。我正在使用以下内容从文本中提取引文,在大型数据集上有多个结果。我试图让输出成为数据帧中的字符串,因此我可以轻松地将其作为 csv 与其他人共享。
示例数据:
normalCase <- 'He said, "I am a test," very quickly.'
endCase <- 'This is a long quote, which we said, "Would never happen."'
shortCase <- 'A "quote" yo';
beginningCase <- '"I said this," he said quickly';
multipleCase <- 'When asked, "No," said Sam "I do not like green eggs and ham."'
testdata = c(normalCase,endCase,shortCase,beginningCase,multipleCase)
使用以下内容提取引号和字符缓冲区:
result <-function(testdata) {
str_extract_all(testdata, '[^\"]?{15}"[^\"]+"[^\"]?{15}')
}
extract <- sapply(testdata, FUN=result)
摘录是矩阵中的列表。但是,我希望提取物是一个字符串,稍后我可以将其作为列合并到数据框中。我该如何转换?
代码
normalCase <- 'He said, "I am a test," very quickly.'
endCase <- 'This is a long quote, which we said, "Would never happen."'
shortCase <- 'A "quote" yo';
beginningCase <- '"I said this," he said quickly';
multipleCase <- 'When asked, "No," said Sam "I do not like green eggs and ham."'
testdata = c(normalCase,endCase,shortCase,beginningCase,multipleCase)
# extract quotations
gsub(pattern = "[^\"]*((?:\"[^\"]*\")|$)", replacement = "\1 ", x = testdata)
输出
[1] "\"I am a test,\" "
[2] "\"Would never happen.\" "
[3] "\"quote\" "
[4] "\"I said this,\" "
[5] "\"No,\" \"I do not like green eggs and ham.\" "
说明
pattern = "[^\"]"
将匹配除双引号外的任何字符
pattern = "[^\"]*"
将与除双引号之外的任何字符匹配 0 次或更多次
pattern = "\"[^\"]*\""
将匹配双引号,然后是任何
除双引号 0 次或多次外的字符,然后是另一个双引号
引用(即)引用
pattern = "(?:\"[^\"]*\")"
将匹配引号,但不会捕获
它
pattern = "((?:\"[^\"]*\")|$)"
将匹配引号或 endOfString,
并捕获它。请注意,这是我们捕获的第一组
replacement = "\1 "
将替换为我们捕获的第一组,然后是 space
R挣扎。我正在使用以下内容从文本中提取引文,在大型数据集上有多个结果。我试图让输出成为数据帧中的字符串,因此我可以轻松地将其作为 csv 与其他人共享。
示例数据:
normalCase <- 'He said, "I am a test," very quickly.'
endCase <- 'This is a long quote, which we said, "Would never happen."'
shortCase <- 'A "quote" yo';
beginningCase <- '"I said this," he said quickly';
multipleCase <- 'When asked, "No," said Sam "I do not like green eggs and ham."'
testdata = c(normalCase,endCase,shortCase,beginningCase,multipleCase)
使用以下内容提取引号和字符缓冲区:
result <-function(testdata) {
str_extract_all(testdata, '[^\"]?{15}"[^\"]+"[^\"]?{15}')
}
extract <- sapply(testdata, FUN=result)
摘录是矩阵中的列表。但是,我希望提取物是一个字符串,稍后我可以将其作为列合并到数据框中。我该如何转换?
代码
normalCase <- 'He said, "I am a test," very quickly.'
endCase <- 'This is a long quote, which we said, "Would never happen."'
shortCase <- 'A "quote" yo';
beginningCase <- '"I said this," he said quickly';
multipleCase <- 'When asked, "No," said Sam "I do not like green eggs and ham."'
testdata = c(normalCase,endCase,shortCase,beginningCase,multipleCase)
# extract quotations
gsub(pattern = "[^\"]*((?:\"[^\"]*\")|$)", replacement = "\1 ", x = testdata)
输出
[1] "\"I am a test,\" "
[2] "\"Would never happen.\" "
[3] "\"quote\" "
[4] "\"I said this,\" "
[5] "\"No,\" \"I do not like green eggs and ham.\" "
说明
pattern = "[^\"]"
将匹配除双引号外的任何字符pattern = "[^\"]*"
将与除双引号之外的任何字符匹配 0 次或更多次pattern = "\"[^\"]*\""
将匹配双引号,然后是任何 除双引号 0 次或多次外的字符,然后是另一个双引号 引用(即)引用pattern = "(?:\"[^\"]*\")"
将匹配引号,但不会捕获 它pattern = "((?:\"[^\"]*\")|$)"
将匹配引号或 endOfString, 并捕获它。请注意,这是我们捕获的第一组replacement = "\1 "
将替换为我们捕获的第一组,然后是 space