为什么 data.matrix 正在更改数据框中的信息
Why data.matrix is changing the information in a data frame
我正在尝试将以下数据框转换为矩阵。
> dput(data)
structure(list(`1` = structure(c(1L, 1L, 3L, 3L, 1L), .Label = c("1",
"2", "3", "4", "5", "NA"), class = "factor"), `2` = structure(c(5L,
5L, 2L, 2L, 5L), .Label = c("1", "2", "3", "4", "5", "6", "NA"
), class = "factor"), `3` = structure(c(34L, 46L, 51L, 28L, 13L
), .Label = c("0", "1", "10", "100", "105", "11", "110", "112",
"12", "120", "14", "15", "16", "168", "18", "2", "20", "200",
"21", "22", "24", "25", "26", "27", "28", "29", "3", "30", "31",
"32", "35", "36", "4", "40", "41", "42", "42099", "42131", "42134",
"42197", "42292", "45", "48", "49", "5", "50", "54", "55", "56",
"6", "60", "64", "65", "7", "70", "72", "75", "77", "8", "80",
"82", "84", "85", "9", "90", "NA"), class = "factor"), `4` = structure(c(1L,
2L, 2L, 1L, 1L), .Label = c("0", "1", "NA"), class = "factor"),
`5` = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("0", "1",
"NA"), class = "factor")), .Names = c("1", "2", "3", "4",
"5"), row.names = c(1L, 2L, 4L, 5L, 6L), class = "data.frame")
然而,当我使用data.matrix
时,结果是不同的数据集。下面是我得到的新数据集。你有什么主意吗?我是 运行 OS X 10.10.4 的 3.2.1 R 版本。提前致谢。
> data_cleaned <- data.matrix(data)
> dput(data_cleaned)
structure(c(1L, 1L, 3L, 3L, 1L, 5L, 5L, 2L, 2L, 5L, 34L, 46L,
51L, 28L, 13L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Dim = c(5L,
5L), .Dimnames = list(c("1", "2", "4", "5", "6"), c("1", "2",
"3", "4", "5")))
您将一些数据存储为因素。当你在一个因子上调用 as.numeric 时,你得到的是因子的水平而不是实际值,如果它恰好是数字:
x = as.factor(c(5,4,3))
as.numeric(x)
但这行得通:
as.numeric(as.character(x))
你可以试试:
sapply(data, function(x) as.numeric(as.character(x)))
把它包裹在你的整个人身上data.frame
还有另一种可能性:
size <-dim(data)
m <- matrix(as.numeric(as.matrix(data)),nrow=size[1],ncol=size[2])
#> m
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 5 40 0 0
#[2,] 1 5 50 1 0
#[3,] 3 2 60 1 0
#[4,] 3 2 30 0 0
#[5,] 1 5 16 0 0
#> class(m)
#[1] "matrix"
#> str(m)
# num [1:5, 1:5] 1 1 3 3 1 5 5 2 2 5 ...
希望对您有所帮助。
我正在尝试将以下数据框转换为矩阵。
> dput(data)
structure(list(`1` = structure(c(1L, 1L, 3L, 3L, 1L), .Label = c("1",
"2", "3", "4", "5", "NA"), class = "factor"), `2` = structure(c(5L,
5L, 2L, 2L, 5L), .Label = c("1", "2", "3", "4", "5", "6", "NA"
), class = "factor"), `3` = structure(c(34L, 46L, 51L, 28L, 13L
), .Label = c("0", "1", "10", "100", "105", "11", "110", "112",
"12", "120", "14", "15", "16", "168", "18", "2", "20", "200",
"21", "22", "24", "25", "26", "27", "28", "29", "3", "30", "31",
"32", "35", "36", "4", "40", "41", "42", "42099", "42131", "42134",
"42197", "42292", "45", "48", "49", "5", "50", "54", "55", "56",
"6", "60", "64", "65", "7", "70", "72", "75", "77", "8", "80",
"82", "84", "85", "9", "90", "NA"), class = "factor"), `4` = structure(c(1L,
2L, 2L, 1L, 1L), .Label = c("0", "1", "NA"), class = "factor"),
`5` = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("0", "1",
"NA"), class = "factor")), .Names = c("1", "2", "3", "4",
"5"), row.names = c(1L, 2L, 4L, 5L, 6L), class = "data.frame")
然而,当我使用data.matrix
时,结果是不同的数据集。下面是我得到的新数据集。你有什么主意吗?我是 运行 OS X 10.10.4 的 3.2.1 R 版本。提前致谢。
> data_cleaned <- data.matrix(data)
> dput(data_cleaned)
structure(c(1L, 1L, 3L, 3L, 1L, 5L, 5L, 2L, 2L, 5L, 34L, 46L,
51L, 28L, 13L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Dim = c(5L,
5L), .Dimnames = list(c("1", "2", "4", "5", "6"), c("1", "2",
"3", "4", "5")))
您将一些数据存储为因素。当你在一个因子上调用 as.numeric 时,你得到的是因子的水平而不是实际值,如果它恰好是数字:
x = as.factor(c(5,4,3))
as.numeric(x)
但这行得通:
as.numeric(as.character(x))
你可以试试:
sapply(data, function(x) as.numeric(as.character(x)))
把它包裹在你的整个人身上data.frame
还有另一种可能性:
size <-dim(data)
m <- matrix(as.numeric(as.matrix(data)),nrow=size[1],ncol=size[2])
#> m
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 5 40 0 0
#[2,] 1 5 50 1 0
#[3,] 3 2 60 1 0
#[4,] 3 2 30 0 0
#[5,] 1 5 16 0 0
#> class(m)
#[1] "matrix"
#> str(m)
# num [1:5, 1:5] 1 1 3 3 1 5 5 2 2 5 ...
希望对您有所帮助。