如何将因子列更改为数字列
How to change factor columns to numeric columns
我有一个包含因子列的数据框,我需要将它们更改为数字。
head(IBOV)
Date Price Open High Low Vol. Change..
1 Oct 18, 2019 104,784.74 105,011.71 105,464.25 104,524.97 2.84M -0.22%
2 Oct 17, 2019 105,015.77 105,388.63 105,891.19 104,826.61 4.19M -0.39%
3 Oct 16, 2019 105,422.80 104,485.87 105,462.07 103,521.08 4.51M 0.89%
4 Oct 15, 2019 104,489.56 104,298.53 105,047.62 104,052.48 4.09M 0.18%
5 Oct 14, 2019 104,301.58 103,833.59 104,304.85 103,438.47 2.99M 0.45%
6 Oct 11, 2019 103,831.92 101,818.60 104,380.89 101,818.60 4.35M 1.98%
我尝试使用以下代码将第 2 列更改为第 5 列:
IBOV[ ,2:5] <- as.numeric(gsub(",", "", IBOV[ ,2:5]))
但是 returns 它们都是 NA,并且这条消息:
IBOV[2:5] <- as.numeric(gsub(",", "",IBOV[2:5]))
警告信息:
强制引入的 NA
head(IBOV)
Date Price Open High Low Vol. Change..
1 Oct 18, 2019 NA NA NA NA 2.84M -0.22%
2 Oct 17, 2019 NA NA NA NA 4.19M -0.39%
3 Oct 16, 2019 NA NA NA NA 4.51M 0.89%
4 Oct 15, 2019 NA NA NA NA 4.09M 0.18%
5 Oct 14, 2019 NA NA NA NA 2.99M 0.45%
6 Oct 11, 2019 NA NA NA NA 4.35M 1.98%
我做错了什么?
这样进行:
第 1 步:您的数据(在没有可重现数据的情况下,这里有一些模拟数据):
set.seed(12)
df <- data.frame(
var1 = sample(1:10, 3),
var2 = c("2,130.34", "1,000.01", "20,999.55"),
var3 = c("23%", "-1.45%", "12.88%")
)
df
var1 var2 var3
1 1 2,130.34 23%
2 8 1,000.01 -1.45%
3 9 20,999.55 12.88%
第 2 步:str
表明您有两个因素:
str(df)
'data.frame': 3 obs. of 3 variables:
$ var1: int 1 8 9
$ var2: Factor w/ 3 levels "1,000.01","2,130.34",..: 2 1 3
$ var3: Factor w/ 3 levels "-1.45%","12.88%",..: 3 1 2
第 3 步:使用 lapply
:
将因子转换为字符
df[,2:3] <- lapply(df[,2:3], as.character)
第 4 步:删除逗号和百分号:
df[,2:3] <- lapply(df[,2:3], function(x) gsub(",|%", "", x))
df
var1 var2 var3
1 1 2130.34 23
2 8 1000.01 -1.45
3 9 20999.55 12.88
第 5 步:转换为数字:
df[,2:3] <- lapply(df[,2:3], as.numeric)
第 6 步:检查转换:
(str(df)
'data.frame': 3 obs. of 3 variables:
$ var1: int 1 8 9
$ var2: num 2130 1000 21000
$ var3: num 23 -1.45 12.88
我有一个包含因子列的数据框,我需要将它们更改为数字。
head(IBOV)
Date Price Open High Low Vol. Change..
1 Oct 18, 2019 104,784.74 105,011.71 105,464.25 104,524.97 2.84M -0.22%
2 Oct 17, 2019 105,015.77 105,388.63 105,891.19 104,826.61 4.19M -0.39%
3 Oct 16, 2019 105,422.80 104,485.87 105,462.07 103,521.08 4.51M 0.89%
4 Oct 15, 2019 104,489.56 104,298.53 105,047.62 104,052.48 4.09M 0.18%
5 Oct 14, 2019 104,301.58 103,833.59 104,304.85 103,438.47 2.99M 0.45%
6 Oct 11, 2019 103,831.92 101,818.60 104,380.89 101,818.60 4.35M 1.98%
我尝试使用以下代码将第 2 列更改为第 5 列:
IBOV[ ,2:5] <- as.numeric(gsub(",", "", IBOV[ ,2:5]))
但是 returns 它们都是 NA,并且这条消息:
IBOV[2:5] <- as.numeric(gsub(",", "",IBOV[2:5])) 警告信息: 强制引入的 NA
head(IBOV) Date Price Open High Low Vol. Change.. 1 Oct 18, 2019 NA NA NA NA 2.84M -0.22% 2 Oct 17, 2019 NA NA NA NA 4.19M -0.39% 3 Oct 16, 2019 NA NA NA NA 4.51M 0.89% 4 Oct 15, 2019 NA NA NA NA 4.09M 0.18% 5 Oct 14, 2019 NA NA NA NA 2.99M 0.45% 6 Oct 11, 2019 NA NA NA NA 4.35M 1.98%
我做错了什么?
这样进行:
第 1 步:您的数据(在没有可重现数据的情况下,这里有一些模拟数据):
set.seed(12)
df <- data.frame(
var1 = sample(1:10, 3),
var2 = c("2,130.34", "1,000.01", "20,999.55"),
var3 = c("23%", "-1.45%", "12.88%")
)
df
var1 var2 var3
1 1 2,130.34 23%
2 8 1,000.01 -1.45%
3 9 20,999.55 12.88%
第 2 步:str
表明您有两个因素:
str(df)
'data.frame': 3 obs. of 3 variables:
$ var1: int 1 8 9
$ var2: Factor w/ 3 levels "1,000.01","2,130.34",..: 2 1 3
$ var3: Factor w/ 3 levels "-1.45%","12.88%",..: 3 1 2
第 3 步:使用 lapply
:
df[,2:3] <- lapply(df[,2:3], as.character)
第 4 步:删除逗号和百分号:
df[,2:3] <- lapply(df[,2:3], function(x) gsub(",|%", "", x))
df
var1 var2 var3
1 1 2130.34 23
2 8 1000.01 -1.45
3 9 20999.55 12.88
第 5 步:转换为数字:
df[,2:3] <- lapply(df[,2:3], as.numeric)
第 6 步:检查转换:
(str(df)
'data.frame': 3 obs. of 3 variables:
$ var1: int 1 8 9
$ var2: num 2130 1000 21000
$ var3: num 23 -1.45 12.88