如何根据一些参数生成字母串
how to generate string of letters based on some parameters
我有一组句子,每个句子中的单词数不同。我需要用一串字母替换每个单词,但是这串字母需要基于特定的标准。例如,字母't'只能被字母'i'、'l'、'f'代替;对于字母表中的每个字母,字母 'e' 只能替换为 'o' 或 'c',依此类推。此外,单词之间的空格以及句号、撇号和其他标点符号都需要保持完整。下面举个例子:
原句:他爱狗。
带有一串字母的句子:Fc tcwoz bcy。
有没有办法在 R 中自动执行此过程?谢谢。
添加:我需要替换大约 400 个句子。句子存储在数据框的变量中 (data$sentences)。
更新 2:一些代码重构,添加了一个简单的回退策略来处理丢失的字符(因此我们可以对给定字符串中的所有字符进行编码,即使我们不t 具有精确的一对一映射),并在字符串向量上添加示例循环。
# we define two different strings to be encode
mystrings <- c('bye', 'BYE')
# the dictionary with the replacements for each letter
# for the lowercase letters we are defining the exact entries
replacements <- {}
replacements['a'] <- 'xy'
replacements['b'] <- 'zp'
replacements['c'] <- '91'
# ...
replacements['e'] <- 'xyv'
replacements['y'] <- 'opj'
# then we define a generic "fallback" entry
# to be used when we have no clues on how to encode a 'new' character
replacements['fallback'] <- '2345678'
# string, named vector -> character
# returns a single character chosen at random from the dictionary
get_random_entry <- function(entry, dictionary) {
value <- dictionary[entry]
# if we don't know how to encode it, use the fallback
if (is.na(value)) {
value <- dictionary['fallback']
}
# possible replacement for the current character
possible.replacements <- strsplit(value[[1]], '')[[1]]
# the actual replacement
result <- sample(possible.replacements, 1)
return(result)
}
# string, named vector -> string
# encode the given string, using the given named vector as dictionary
encode <- function(s, dictionary) {
# get the actual subsitutions
substitutions <- sapply (strsplit(s,'')[[1]], function(ch) {
# for each char in the string 's'
# we collect the respective encoded version
return(get_random_entry(ch, dictionary))
}, USE.NAMES = F,simplify = T);
# paste the resulting vector into a single string
result <- paste(substitutions, collapse = '')
# and return it
return(result);
}
# we can use sapply to process all the strings defined in mystrings
# for 'bye' we know how to translate
# for 'BYE' we don't know; we'll use the fallback entry
encoded_strings <- sapply(mystrings, function(s) {
# encode a single string
encode(s, replacements)
}, USE.NAMES = F)
encoded_strings
我有一组句子,每个句子中的单词数不同。我需要用一串字母替换每个单词,但是这串字母需要基于特定的标准。例如,字母't'只能被字母'i'、'l'、'f'代替;对于字母表中的每个字母,字母 'e' 只能替换为 'o' 或 'c',依此类推。此外,单词之间的空格以及句号、撇号和其他标点符号都需要保持完整。下面举个例子: 原句:他爱狗。 带有一串字母的句子:Fc tcwoz bcy。
有没有办法在 R 中自动执行此过程?谢谢。
添加:我需要替换大约 400 个句子。句子存储在数据框的变量中 (data$sentences)。
更新 2:一些代码重构,添加了一个简单的回退策略来处理丢失的字符(因此我们可以对给定字符串中的所有字符进行编码,即使我们不t 具有精确的一对一映射),并在字符串向量上添加示例循环。
# we define two different strings to be encode
mystrings <- c('bye', 'BYE')
# the dictionary with the replacements for each letter
# for the lowercase letters we are defining the exact entries
replacements <- {}
replacements['a'] <- 'xy'
replacements['b'] <- 'zp'
replacements['c'] <- '91'
# ...
replacements['e'] <- 'xyv'
replacements['y'] <- 'opj'
# then we define a generic "fallback" entry
# to be used when we have no clues on how to encode a 'new' character
replacements['fallback'] <- '2345678'
# string, named vector -> character
# returns a single character chosen at random from the dictionary
get_random_entry <- function(entry, dictionary) {
value <- dictionary[entry]
# if we don't know how to encode it, use the fallback
if (is.na(value)) {
value <- dictionary['fallback']
}
# possible replacement for the current character
possible.replacements <- strsplit(value[[1]], '')[[1]]
# the actual replacement
result <- sample(possible.replacements, 1)
return(result)
}
# string, named vector -> string
# encode the given string, using the given named vector as dictionary
encode <- function(s, dictionary) {
# get the actual subsitutions
substitutions <- sapply (strsplit(s,'')[[1]], function(ch) {
# for each char in the string 's'
# we collect the respective encoded version
return(get_random_entry(ch, dictionary))
}, USE.NAMES = F,simplify = T);
# paste the resulting vector into a single string
result <- paste(substitutions, collapse = '')
# and return it
return(result);
}
# we can use sapply to process all the strings defined in mystrings
# for 'bye' we know how to translate
# for 'BYE' we don't know; we'll use the fallback entry
encoded_strings <- sapply(mystrings, function(s) {
# encode a single string
encode(s, replacements)
}, USE.NAMES = F)
encoded_strings