无法在R中将字符转换为数字
cannot convert character into numeric in R
我从以下网站“weather underground”复制并粘贴了天气信息进行一些数据分析,数据如下所示:
https://www.wunderground.com/dashboard/pws/KCACHINO13/table/2018-04-10/2018-04-10/daily
如你所见,温度等信息都有文字,无法计算。在 excel 中,我使用 substitute(xx,"F","") 从“温度”列中删除 F,但随后我想使用 convert(xx,"F", “C”),我无法得到结果。我认为数据本身有问题。我将单元格格式化为数字或将值复制并粘贴到新列,但都没有用。
然后我将 data.frame 导入 R 并尝试使用 R 进行一些数据格式化。我检查了温度列的 class,它显示“字符”:
class(a$Temperature)
#"character"
a$Temperature <- gsub("F","",a$Temperature)
# this command remmoved "F"
as.numeric(a$Temperature)
#Warning message: NAs introduced by coercion
as.numeric(unlist(a$Temperature))
#still the same warning message
从 excel,我创建了从温度中删除“F”的新列,并在 R 中使用它来将“字符”转换为“数字”,我仍然收到警告消息。我不知道如何处理这个问题。有人可以帮我解决这个问题吗?谢谢!
按照下面的建议,我粘贴来自
的输出
dput(head(a))
#structure(list(Time = structure(c(-2209075140, -2209074840, -2209074540,
-2209074240, -2209073940, -2209073640), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Temperature = c("60.0 ", "59.9 ", "59.8 ", "59.7 ",
"59.6 ", "59.5 "), `T(F)` = c("60.0 ", "59.9 ", "59.8 ", "59.7 ",
"59.6 ", "59.5 "), `Dew Point` = c("48.2 F", "48.1 F", "48.4 F",
"48.3 F", "48.2 F", "48.1 F"), Humidity = c("65 %", "65 %", "66 %",
"66 %", "66 %", "66 %"), Wind = c("WSW", "WSW", "WSW", "WSW",
"WSW", "WSW"), Speed = c("0.0 mph", "0.0 mph", "0.0 mph", "0.0 mph",
"0.0 mph", "0.0 mph"), Gust = c("0.0 mph", "0.0 mph", "0.0 mph",
"0.0 mph", "0.0 mph", "0.0 mph"), Pressure = c("29.88 in", "29.88 in",
"29.88 in", "29.88 in", "29.88 in", "29.88 in"), `Precip. Rate.` = c("0.00 in",
"0.00 in", "0.00 in", "0.00 in", "0.00 in", "0.00 in"), `Precip. Accum.` = c("0.00 in",
"0.00 in", "0.00 in", "0.00 in", "0.00 in", "0.00 in"), UV = c(0,
0, 0, 0, 0, 0), Solar = c("0 w/m²", "0 w/m²", "0 w/m²", "0 w/m²",
"0 w/m²", "0 w/m²")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
如果您只想转换温度列,您可以考虑以下选项。
数据
df <- structure(list(Time = c("12:04 AM", "12:09 AM", "12:14 AM", "12:19 AM",
"12:24 AM", "12:29 AM"), Temperature = c("69.4 F", "69.2 F",
"68.8 F", "68.5 F", "68.3 F", "68.0 F"), Dew.Point = c("45.9 F",
"46.0 F", "45.8 F", "45.7 F", "45.7 F", "45.7 F"), Humidity = c("43 %",
"43 %", "44 %", "44 %", "44 %", "45 %"), Wind = c("NE", "NE",
"NE", "NE", "NE", "NE"), Speed = c("0.0 mph", "0.0 mph", "0.0 mph",
"0.0 mph", "0.0 mph", "0.0 mph"), Gust = c("0.0 mph", "0.0 mph",
"0.0 mph", "0.0 mph", "0.0 mph", "0.0 mph"), Pressure = c("29.93 in",
"29.94 in", "29.94 in", "29.95 in", "29.95 in", "29.95 in"),
Precip..Rate. = c("0.00 in", "0.00 in", "0.00 in", "0.00 in",
"0.00 in", "0.00 in"), Precip..Accum. = c("0.00 in", "0.00 in",
"0.00 in", "0.00 in", "0.00 in", "0.00 in"), UV = c(0L, 0L,
0L, 0L, 0L, 0L), Solar = c("0 w/m²", "0 w/m²", "0 w/m²",
"0 w/m²", "0 w/m²", "0 w/m²")), class = "data.frame", row.names = c(NA,
-6L))
代码
library(dplyr)
library(stringr)
df2 <- df %>%
mutate(Temperature2 = as.numeric(str_extract(Temperature, "[\d\.]+"))) %>%
relocate(Temperature2, .after = Temperature)
df2[, 2:3]
# Temperature Temperature2
# 1 69.4 F 69.4
# 2 69.2 F 69.2
# 3 68.8 F 68.8
# 4 68.5 F 68.5
# 5 68.3 F 68.3
# 6 68.0 F 68.0
str(df2$Temperature2)
# num [1:6] 69.4 69.2 68.8 68.5 68.3 68
也许这个,有帮助。
在这个函数中嵌套了几个不同的函数,例如从字符变量变为数字变量。还有 gsub,它将逗号更改为空 space。您应该只将逗号更改为字母您正在更改的内容。从未尝试过它是否适用于字母,但这可能是一种解决方案。这是代码:
data666 <- apply(data, 2, function(x) as.numeric(as.character(gsub(",", "", x))))
Apply function 对整个数据集应用函数。 2 表示它逐列执行。如果您想逐行更改它,则必须将 2 更改为 1。
我从以下网站“weather underground”复制并粘贴了天气信息进行一些数据分析,数据如下所示:
https://www.wunderground.com/dashboard/pws/KCACHINO13/table/2018-04-10/2018-04-10/daily
如你所见,温度等信息都有文字,无法计算。在 excel 中,我使用 substitute(xx,"F","") 从“温度”列中删除 F,但随后我想使用 convert(xx,"F", “C”),我无法得到结果。我认为数据本身有问题。我将单元格格式化为数字或将值复制并粘贴到新列,但都没有用。
然后我将 data.frame 导入 R 并尝试使用 R 进行一些数据格式化。我检查了温度列的 class,它显示“字符”:
class(a$Temperature)
#"character"
a$Temperature <- gsub("F","",a$Temperature)
# this command remmoved "F"
as.numeric(a$Temperature)
#Warning message: NAs introduced by coercion
as.numeric(unlist(a$Temperature))
#still the same warning message
从 excel,我创建了从温度中删除“F”的新列,并在 R 中使用它来将“字符”转换为“数字”,我仍然收到警告消息。我不知道如何处理这个问题。有人可以帮我解决这个问题吗?谢谢!
按照下面的建议,我粘贴来自
的输出dput(head(a))
#structure(list(Time = structure(c(-2209075140, -2209074840, -2209074540,
-2209074240, -2209073940, -2209073640), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Temperature = c("60.0 ", "59.9 ", "59.8 ", "59.7 ",
"59.6 ", "59.5 "), `T(F)` = c("60.0 ", "59.9 ", "59.8 ", "59.7 ",
"59.6 ", "59.5 "), `Dew Point` = c("48.2 F", "48.1 F", "48.4 F",
"48.3 F", "48.2 F", "48.1 F"), Humidity = c("65 %", "65 %", "66 %",
"66 %", "66 %", "66 %"), Wind = c("WSW", "WSW", "WSW", "WSW",
"WSW", "WSW"), Speed = c("0.0 mph", "0.0 mph", "0.0 mph", "0.0 mph",
"0.0 mph", "0.0 mph"), Gust = c("0.0 mph", "0.0 mph", "0.0 mph",
"0.0 mph", "0.0 mph", "0.0 mph"), Pressure = c("29.88 in", "29.88 in",
"29.88 in", "29.88 in", "29.88 in", "29.88 in"), `Precip. Rate.` = c("0.00 in",
"0.00 in", "0.00 in", "0.00 in", "0.00 in", "0.00 in"), `Precip. Accum.` = c("0.00 in",
"0.00 in", "0.00 in", "0.00 in", "0.00 in", "0.00 in"), UV = c(0,
0, 0, 0, 0, 0), Solar = c("0 w/m²", "0 w/m²", "0 w/m²", "0 w/m²",
"0 w/m²", "0 w/m²")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
如果您只想转换温度列,您可以考虑以下选项。
数据
df <- structure(list(Time = c("12:04 AM", "12:09 AM", "12:14 AM", "12:19 AM",
"12:24 AM", "12:29 AM"), Temperature = c("69.4 F", "69.2 F",
"68.8 F", "68.5 F", "68.3 F", "68.0 F"), Dew.Point = c("45.9 F",
"46.0 F", "45.8 F", "45.7 F", "45.7 F", "45.7 F"), Humidity = c("43 %",
"43 %", "44 %", "44 %", "44 %", "45 %"), Wind = c("NE", "NE",
"NE", "NE", "NE", "NE"), Speed = c("0.0 mph", "0.0 mph", "0.0 mph",
"0.0 mph", "0.0 mph", "0.0 mph"), Gust = c("0.0 mph", "0.0 mph",
"0.0 mph", "0.0 mph", "0.0 mph", "0.0 mph"), Pressure = c("29.93 in",
"29.94 in", "29.94 in", "29.95 in", "29.95 in", "29.95 in"),
Precip..Rate. = c("0.00 in", "0.00 in", "0.00 in", "0.00 in",
"0.00 in", "0.00 in"), Precip..Accum. = c("0.00 in", "0.00 in",
"0.00 in", "0.00 in", "0.00 in", "0.00 in"), UV = c(0L, 0L,
0L, 0L, 0L, 0L), Solar = c("0 w/m²", "0 w/m²", "0 w/m²",
"0 w/m²", "0 w/m²", "0 w/m²")), class = "data.frame", row.names = c(NA,
-6L))
代码
library(dplyr)
library(stringr)
df2 <- df %>%
mutate(Temperature2 = as.numeric(str_extract(Temperature, "[\d\.]+"))) %>%
relocate(Temperature2, .after = Temperature)
df2[, 2:3]
# Temperature Temperature2
# 1 69.4 F 69.4
# 2 69.2 F 69.2
# 3 68.8 F 68.8
# 4 68.5 F 68.5
# 5 68.3 F 68.3
# 6 68.0 F 68.0
str(df2$Temperature2)
# num [1:6] 69.4 69.2 68.8 68.5 68.3 68
也许这个,有帮助。 在这个函数中嵌套了几个不同的函数,例如从字符变量变为数字变量。还有 gsub,它将逗号更改为空 space。您应该只将逗号更改为字母您正在更改的内容。从未尝试过它是否适用于字母,但这可能是一种解决方案。这是代码:
data666 <- apply(data, 2, function(x) as.numeric(as.character(gsub(",", "", x))))
Apply function 对整个数据集应用函数。 2 表示它逐列执行。如果您想逐行更改它,则必须将 2 更改为 1。