如何在 R 中将分类变量转换为数值?
How to convert categorical variable to numerical in R?
我对 R 很陌生
我有以下数据集
age sex bmi children smoker region charges sex_N
1 19 female 27.900 0 yes southwest 16884.924 female
2 18 male 33.770 1 no southeast 1725.552 male
3 28 male 33.000 3 no southeast 4449.462 male
4 33 male 22.705 0 no northwest 21984.471 male
5 32 male 28.880 0 no northwest 3866.855 male
6 31 female 25.740 0 no southeast 3756.622 female
我想根据其他列预测费用,但其他列是分类的
如何将它们更改为数字变量?
我试过costs$sex_N <- as.factor(costs$sex)
但是没有给我正确的列,正如你在上面看到的那样?
另外,如果列的唯一值大于 2 ,如何转换它们?
请帮助!
这里有两个可能有用的基础 R 选项
> transform(
+ costs,
+ sex_N = as.integer(as.factor(sex_N))
+ )
age sex bmi children smoker region charges sex_N
1 19 female 27.900 0 yes southwest 16884.924 1
2 18 male 33.770 1 no southeast 1725.552 2
3 28 male 33.000 3 no southeast 4449.462 2
4 33 male 22.705 0 no northwest 21984.471 2
5 32 male 28.880 0 no northwest 3866.855 2
6 31 female 25.740 0 no southeast 3756.622 1
或
> transform(
+ costs,
+ sex_N = match(sex_N, sex_N)
+ )
age sex bmi children smoker region charges sex_N
1 19 female 27.900 0 yes southwest 16884.924 1
2 18 male 33.770 1 no southeast 1725.552 2
3 28 male 33.000 3 no southeast 4449.462 2
4 33 male 22.705 0 no northwest 21984.471 2
5 32 male 28.880 0 no northwest 3866.855 2
6 31 female 25.740 0 no southeast 3756.622 1
我对 R 很陌生
我有以下数据集
age sex bmi children smoker region charges sex_N
1 19 female 27.900 0 yes southwest 16884.924 female
2 18 male 33.770 1 no southeast 1725.552 male
3 28 male 33.000 3 no southeast 4449.462 male
4 33 male 22.705 0 no northwest 21984.471 male
5 32 male 28.880 0 no northwest 3866.855 male
6 31 female 25.740 0 no southeast 3756.622 female
我想根据其他列预测费用,但其他列是分类的 如何将它们更改为数字变量?
我试过costs$sex_N <- as.factor(costs$sex)
但是没有给我正确的列,正如你在上面看到的那样?
另外,如果列的唯一值大于 2 ,如何转换它们? 请帮助!
这里有两个可能有用的基础 R 选项
> transform(
+ costs,
+ sex_N = as.integer(as.factor(sex_N))
+ )
age sex bmi children smoker region charges sex_N
1 19 female 27.900 0 yes southwest 16884.924 1
2 18 male 33.770 1 no southeast 1725.552 2
3 28 male 33.000 3 no southeast 4449.462 2
4 33 male 22.705 0 no northwest 21984.471 2
5 32 male 28.880 0 no northwest 3866.855 2
6 31 female 25.740 0 no southeast 3756.622 1
或
> transform(
+ costs,
+ sex_N = match(sex_N, sex_N)
+ )
age sex bmi children smoker region charges sex_N
1 19 female 27.900 0 yes southwest 16884.924 1
2 18 male 33.770 1 no southeast 1725.552 2
3 28 male 33.000 3 no southeast 4449.462 2
4 33 male 22.705 0 no northwest 21984.471 2
5 32 male 28.880 0 no northwest 3866.855 2
6 31 female 25.740 0 no southeast 3756.622 1