将句子编码成数字

codification of sentence into numerics

我应该开发一个模型来衡量不安全感与威权主义之间的关系。问题是在我的密码本中,项目如下:安全问题是否增加了?

fallen a lot  
fallen a little  
stayed the same  
increased a little  
increased a lot  

现在我想把它们编成数字如下

fallen a lot=-2
fallen a little=-1
stayed the same=0
increased a little=1
increased a lot=2

不知道 = NA

v22g 是感兴趣的栏目

dput(df2$v22g[1:30])


c("fallen a lot", "fallen a little", "stayed the same", "increased little", 
"increased a lot", "fallen a lot", "fallen a little", "stayed the same", 
"increased little", "increased a lot", "fallen a lot", "fallen a little", 
"stayed the same", "increased little", "increased a lot", "fallen a lot", 
"fallen a little", "stayed the same", "increased little", "increased a lot", 
"fallen a lot", "fallen a little", "stayed the same", "increased little", 
"increased a lot", "fallen a lot", "fallen a little", "stayed the same", 
"increased little", "increased a lot")

谁能告诉我该怎么做? 谢谢你

您可以只定义一个函数(这里称为 "numerify",您可以在其中输入特定的字符串并输出相应的数字)。

numerify <- function(ranked){
  switch(ranked,
    "fallen a lot" = -2,
    "fallen a little" = -1,
    "stayed the same"= 0,
    "increased a little" = 1,
    "increased a lot" = 2,
    "don't know" = NA
  )
}

numerify("fallen a lot")

感谢 Rui Barradas 的补充: 这是一个测试 data.frame (df),列 "v22g" 需要数值化。第二行添加一个包含数字的列。

df <- data.frame(v22g=c("fallen a lot", "stayed the same"))
df$numbers <- sapply(as.character(df$v22g), numerify)
df

此解决方案首先强制转换为 class "factor",因子水平按要求的顺序排列,然后转换为整数。

levs_v22g <-
c("fallen a lot", 
  "fallen a little", 
  "stayed the same", 
  "increased a little", 
  "increased a lot", 
  "don't know")

df$v22gPoints <- factor(df$v22g, levels = levs_v22g)
df$v22gPoints <- as.integer(df$v22gPoints) - 3
is.na(df$v22gPoints) <- df$v22g == "don't know"

head(df, 10)
                 v22g v22gPoints
#1  increased a little          1
#2     fallen a little         -1
#3          don't know         NA
#4     increased a lot          2
#5  increased a little          1
#6        fallen a lot         -2
#7     increased a lot          2
#8          don't know         NA
#9  increased a little          1
#10    fallen a little         -1

数据创建代码。

set.seed(1234)
n <- 30
v22g <- sample(levs_v22g, n, TRUE)
df <- data.frame(v22g)