如果字符串(带标点符号)包含特定文本,则重新编码
Recode if string (with punctuation) contains certain text
如何搜索字符向量,如果给定索引处的字符串包含模式,如何替换该索引的值?
我试过这个:
List <- c(1:8)
Types<-as.character(c(
"ABC, the (stuff).\n\n\n fun", "meaningful", "relevant", "rewarding",
"unpleasant", "enjoyable", "engaging", "disinteresting"))
for (i in List) {
if (grepl(Types[i], "fun", fixed = TRUE))
{Types[i]="1"
} else if (grepl(Types[i], "meaningful", fixed = TRUE))
{Types[i]="2"}}
该代码适用于“有意义的”,但当字符串中有标点符号或其他内容时无效,例如“有趣”。
grepl
的第一个参数是模式,而不是字符串。
这将是您代码的字面修正:
for (i in seq_along(Types)) {
if (grepl("fun", Types[i], fixed = TRUE)) {
Types[i] = "1"
} else if (grepl("meaningful", Types[i], fixed = TRUE)) {
Types[i] = "2"
}
}
Types
# [1] "1" "2" "relevant" "rewarding" "unpleasant"
# [6] "enjoyable" "engaging" "disinteresting"
顺便说一句,List
的使用是有效的,但它有点额外:当你有这样的单独变量时,一个变量可能与另一个变量不同步。例如,如果您更新 Types
而忘记更新 List
,那么它将中断(或失败)。为此,我使用 seq_along(Types)
代替。
顺便说一句:这里有一个略有不同的版本,它保留 Types
不变和 returns 一个新的矢量,并向您介绍矢量化的强大功能:
Types[grepl("fun", Types, fixed = TRUE)] <- "1"
Types[grepl("meaningful", Types, fixed = TRUE)] <- "2"
Types
# [1] "1" "2" "relevant" "rewarding" "unpleasant"
# [6] "enjoyable" "engaging" "disinteresting"
下一级别(可能过于复杂?)将在一个框架中存储模式和重新编码替换(总是 1 对 1,你永远不会不小心更新一个而没有另一个,可以存储如果需要,在 CSV 中)和 Reduce
就可以了:
ptns <- data.frame(ptn = c("fun", "meaningful"), repl = c("1", "2"))
Reduce(function(txt, i) {
txt[grepl(ptns$ptn[i], txt, fixed = TRUE)] <- ptns$repl[i]
txt
}, seq_len(nrow(ptns)), init = Types)
# [1] "1" "2" "relevant" "rewarding" "unpleasant"
# [6] "enjoyable" "engaging" "disinteresting"
尝试使用 string
包中的 str_replace(string, pattern, replacement)
。
你可以使用 str_replace_all
:
library(stringr)
pat <- c(fun = '1', meaningful = '2')
str_replace_all(Types, setNames(pat, sprintf('(?s).*%s.*', names(pat))))
[1] "1" "2" "relevant"
[4] "rewarding" "unpleasant" "enjoyable"
[7] "engaging" "disinteresting"
如何搜索字符向量,如果给定索引处的字符串包含模式,如何替换该索引的值?
我试过这个:
List <- c(1:8)
Types<-as.character(c(
"ABC, the (stuff).\n\n\n fun", "meaningful", "relevant", "rewarding",
"unpleasant", "enjoyable", "engaging", "disinteresting"))
for (i in List) {
if (grepl(Types[i], "fun", fixed = TRUE))
{Types[i]="1"
} else if (grepl(Types[i], "meaningful", fixed = TRUE))
{Types[i]="2"}}
该代码适用于“有意义的”,但当字符串中有标点符号或其他内容时无效,例如“有趣”。
grepl
的第一个参数是模式,而不是字符串。
这将是您代码的字面修正:
for (i in seq_along(Types)) {
if (grepl("fun", Types[i], fixed = TRUE)) {
Types[i] = "1"
} else if (grepl("meaningful", Types[i], fixed = TRUE)) {
Types[i] = "2"
}
}
Types
# [1] "1" "2" "relevant" "rewarding" "unpleasant"
# [6] "enjoyable" "engaging" "disinteresting"
顺便说一句,List
的使用是有效的,但它有点额外:当你有这样的单独变量时,一个变量可能与另一个变量不同步。例如,如果您更新 Types
而忘记更新 List
,那么它将中断(或失败)。为此,我使用 seq_along(Types)
代替。
顺便说一句:这里有一个略有不同的版本,它保留 Types
不变和 returns 一个新的矢量,并向您介绍矢量化的强大功能:
Types[grepl("fun", Types, fixed = TRUE)] <- "1"
Types[grepl("meaningful", Types, fixed = TRUE)] <- "2"
Types
# [1] "1" "2" "relevant" "rewarding" "unpleasant"
# [6] "enjoyable" "engaging" "disinteresting"
下一级别(可能过于复杂?)将在一个框架中存储模式和重新编码替换(总是 1 对 1,你永远不会不小心更新一个而没有另一个,可以存储如果需要,在 CSV 中)和 Reduce
就可以了:
ptns <- data.frame(ptn = c("fun", "meaningful"), repl = c("1", "2"))
Reduce(function(txt, i) {
txt[grepl(ptns$ptn[i], txt, fixed = TRUE)] <- ptns$repl[i]
txt
}, seq_len(nrow(ptns)), init = Types)
# [1] "1" "2" "relevant" "rewarding" "unpleasant"
# [6] "enjoyable" "engaging" "disinteresting"
尝试使用 string
包中的 str_replace(string, pattern, replacement)
。
你可以使用 str_replace_all
:
library(stringr)
pat <- c(fun = '1', meaningful = '2')
str_replace_all(Types, setNames(pat, sprintf('(?s).*%s.*', names(pat))))
[1] "1" "2" "relevant"
[4] "rewarding" "unpleasant" "enjoyable"
[7] "engaging" "disinteresting"