将句子编码成数字
codification of sentence into numerics
我应该开发一个模型来衡量不安全感与威权主义之间的关系。问题是在我的密码本中,项目如下:安全问题是否增加了?
fallen a lot
fallen a little
stayed the same
increased a little
increased a lot
现在我想把它们编成数字如下
fallen a lot=-2
fallen a little=-1
stayed the same=0
increased a little=1
increased a lot=2
不知道 = NA
v22g 是感兴趣的栏目
dput(df2$v22g[1:30])
c("fallen a lot", "fallen a little", "stayed the same", "increased little",
"increased a lot", "fallen a lot", "fallen a little", "stayed the same",
"increased little", "increased a lot", "fallen a lot", "fallen a little",
"stayed the same", "increased little", "increased a lot", "fallen a lot",
"fallen a little", "stayed the same", "increased little", "increased a lot",
"fallen a lot", "fallen a little", "stayed the same", "increased little",
"increased a lot", "fallen a lot", "fallen a little", "stayed the same",
"increased little", "increased a lot")
谁能告诉我该怎么做?
谢谢你
您可以只定义一个函数(这里称为 "numerify",您可以在其中输入特定的字符串并输出相应的数字)。
numerify <- function(ranked){
switch(ranked,
"fallen a lot" = -2,
"fallen a little" = -1,
"stayed the same"= 0,
"increased a little" = 1,
"increased a lot" = 2,
"don't know" = NA
)
}
numerify("fallen a lot")
感谢 Rui Barradas 的补充:
这是一个测试 data.frame (df),列 "v22g" 需要数值化。第二行添加一个包含数字的列。
df <- data.frame(v22g=c("fallen a lot", "stayed the same"))
df$numbers <- sapply(as.character(df$v22g), numerify)
df
此解决方案首先强制转换为 class "factor"
,因子水平按要求的顺序排列,然后转换为整数。
levs_v22g <-
c("fallen a lot",
"fallen a little",
"stayed the same",
"increased a little",
"increased a lot",
"don't know")
df$v22gPoints <- factor(df$v22g, levels = levs_v22g)
df$v22gPoints <- as.integer(df$v22gPoints) - 3
is.na(df$v22gPoints) <- df$v22g == "don't know"
head(df, 10)
v22g v22gPoints
#1 increased a little 1
#2 fallen a little -1
#3 don't know NA
#4 increased a lot 2
#5 increased a little 1
#6 fallen a lot -2
#7 increased a lot 2
#8 don't know NA
#9 increased a little 1
#10 fallen a little -1
数据创建代码。
set.seed(1234)
n <- 30
v22g <- sample(levs_v22g, n, TRUE)
df <- data.frame(v22g)
fallen a lot
fallen a little
stayed the same
increased a little
increased a lot
现在我想把它们编成数字如下
fallen a lot=-2
fallen a little=-1
stayed the same=0
increased a little=1
increased a lot=2
不知道 = NA
v22g 是感兴趣的栏目
dput(df2$v22g[1:30])
c("fallen a lot", "fallen a little", "stayed the same", "increased little",
"increased a lot", "fallen a lot", "fallen a little", "stayed the same",
"increased little", "increased a lot", "fallen a lot", "fallen a little",
"stayed the same", "increased little", "increased a lot", "fallen a lot",
"fallen a little", "stayed the same", "increased little", "increased a lot",
"fallen a lot", "fallen a little", "stayed the same", "increased little",
"increased a lot", "fallen a lot", "fallen a little", "stayed the same",
"increased little", "increased a lot")
谁能告诉我该怎么做? 谢谢你
您可以只定义一个函数(这里称为 "numerify",您可以在其中输入特定的字符串并输出相应的数字)。
numerify <- function(ranked){
switch(ranked,
"fallen a lot" = -2,
"fallen a little" = -1,
"stayed the same"= 0,
"increased a little" = 1,
"increased a lot" = 2,
"don't know" = NA
)
}
numerify("fallen a lot")
感谢 Rui Barradas 的补充: 这是一个测试 data.frame (df),列 "v22g" 需要数值化。第二行添加一个包含数字的列。
df <- data.frame(v22g=c("fallen a lot", "stayed the same"))
df$numbers <- sapply(as.character(df$v22g), numerify)
df
此解决方案首先强制转换为 class "factor"
,因子水平按要求的顺序排列,然后转换为整数。
levs_v22g <-
c("fallen a lot",
"fallen a little",
"stayed the same",
"increased a little",
"increased a lot",
"don't know")
df$v22gPoints <- factor(df$v22g, levels = levs_v22g)
df$v22gPoints <- as.integer(df$v22gPoints) - 3
is.na(df$v22gPoints) <- df$v22g == "don't know"
head(df, 10)
v22g v22gPoints
#1 increased a little 1
#2 fallen a little -1
#3 don't know NA
#4 increased a lot 2
#5 increased a little 1
#6 fallen a lot -2
#7 increased a lot 2
#8 don't know NA
#9 increased a little 1
#10 fallen a little -1
数据创建代码。
set.seed(1234)
n <- 30
v22g <- sample(levs_v22g, n, TRUE)
df <- data.frame(v22g)