通过将字符列分成 (1,2,3,4) 部分，将字符列更改为连续列

Question

我有一个数据集，我正在尝试运行对其进行 glm 回归，但它包含年龄限制、种族和合并症等字符 class。我想将这些列更改为连续变量，以便回归可以接受它。下面的数据，我想将 TBI.irace2 更改为 (Hispanic=1, Black=2, white=3, and other=4) 与 age (age 18-28=1, 29-46=2, 47-64) =3, and >64=4) 和 NISS (NISS 0-10=1, NISS 11-20=2, NISS 21-30=3, and NISS 31-40=4, NISS41-50=5, NISS 51 -60=6, NISS 61-70=7, NISS>70= 8)

请在下面找到数据摘要

TBI.crani = c(0, 0, 0, 0, 0, 0), TBI.vte = c(0, 
0, 0, 0, 0, 0), TBI.FEMALE = c(0, 0, 1, 0, 1, 0), TBI.iracecat2 = c("Whites", 
"Whites", "Whites", "Hispanics", "Whites", "Blacks"), TBI.agecat = c("Age 47-64", 
"Age 29-46", "Age > 64", "Age 29-46", "Age 18-28", "Age 18-28"
), TBI.nisscategory = c("NISS 21-30", "NISS 11-20", "NISS 21-30", 
"NISS 11-20", "NISS 11-20", "NISS 0-10"), TBI.LOS = c(5, 8, 1, 
3, 19, 1), TBI.hospitalteach = c(0, 0, 1, 1, 1, 1), TBI.largebedsize = c(1, 
1, 1, 1, 1, 1), TBI.CM_ALCOHOL = c(0, 0, 0, 1, 0, 0), TBI.CM_ANEMDEF = c(0, 
0, 0, 0, 0, 0), TBI.CM_BLDLOSS = c(0, 0, 0, 0, 0, 0), TBI.CM_CHF = c(1, 
0, 0, 0, 0, 0), TBI.CM_CHRNLUNG = c(0, 0, 0, 0, 0, 0), TBI.CM_COAG = c(0, 
0, 0, 0, 1, 0), TBI.CM_HYPOTHY = c(0, 0, 0, 0, 0, 0), TBI.CM_LYTES = c(0, 
0, 0, 0, 0, 0), TBI.CM_METS = c(0, 0, 0, 0, 0, 0), TBI.CM_NEURO = c(0, 
0, 0, 0, 0, 0), TBI.CM_OBESE = c(0, 0, 0, 0, 0, 0), TBI.CM_PARA = c(0, 
0, 0, 0, 0, 0), TBI.CM_PSYCH = c(0, 1, 0, 0, 0, 0), TBI.CM_TUMOR = c(0, 
0, 0, 0, 0, 0), TBI.CM_WGHTLOSS = c(0, 0, 0, 0, 0, 0), TBI.UTI = c(0, 
0, 0, 0, 0, 0), TBI.pneumonia = c(0, 0, 0, 0, 0, 0), TBI.AMI = c(0, 
0, 0, 0, 0, 0), TBI.sepsis = c(0, 0, 0, 0, 0, 0), TBI.arrest = c(0, 
0, 0, 0, 0, 0), TBI.spineinjury = c(0, 0, 0, 0, 0, 0), TBI.legfracture = c(0, 
0, 0, 0, 0, 0), TBI_time_to_surg.NEW = c(0, 0, 0, 0, 0, 0)), row.names = c(NA, 
6L), class = "data.frame")

Answer 1

一个小技巧，提供一个刚好足以解决您的问题的小样本集。

library(data.table)

# took a small sample and changed one value to Asian
dt <- data.table(
  TBI.FEMALE = c(0, 0, 1, 0, 1, 0), 
  TBI.iracecat2 = as.character(c("Whites", "Whites", "Asian", "Hispanics", "Whites", "Blacks"))
)

# define race groups, and note I did not define Asian
convert_race <- c("Hispanics" = 1, "Blacks" = 2, "Whites" = 3) # other will all be not defined

dt[, TBI.irace2 := lapply(TBI.iracecat2, function(x) convert_race[x]), by = TBI.iracecat2]
dt[is.na(TBI.irace2), TBI.irace2 := 4]

dt

#    TBI.FEMALE TBI.iracecat2 TBI.irace2
# 1:          0        Whites          3
# 2:          0        Whites          3
# 3:          1         Asian          4
# 4:          0     Hispanics          1
# 5:          1        Whites          3
# 6:          0        Blacks          2

通过将字符列分成 (1,2,3,4) 部分，将字符列更改为连续列

Changing a character column into a continuous column, by dividing them into sections (1,2,3,4)

regression

r