如何将数据帧切片传递给直方图函数以在 R 中进行模式归一化？

Question

我想通过用户定义的规范化来规范化数据框的选定列。到目前为止，我打算

library(tidyr)
library(ggplot2)

Mode <- function(x, na.rm =  TRUE) {
  x <- lapply(x, as.numeric)
  distribution <- hist(x, breaks = 50, plot = FALSE)
  distribution$mids[which.max(distribution$counts)]
}

data_normalised <- lapply(mtcars[,-9:-12], function(x) {(x-Mode(x))/(sd(x))})

作为最小的例子。但是，hist 抱怨 "x must be numeric"。我认为这可以通过铸造

来解决

x <- lapply(x, as.numeric)

这是行不通的。我知道 hist 适用于

hist(mtcars[[3]])

但我找不到结合 df 切片和 hist 函数的方法，因为

hist(mtcars[[-9:-12]])

八岁就不行了。

理想情况下，我希望 Mode() 函数像 sd() 函数一样工作。取一个 df 列并返回一个值。

感谢您的帮助！

Answer 1

在你的 Mode() 函数中，class(x) 是 'numeric'（即你有一个长度为 32 的数字向量）。然后使用 lapply() 将函数 as.numeric() 应用到 x。在这一步之后，class(x) 是 'list' 因为 as.numeric 是向量化的，你实际上并没有遍历向量的元素，而是你可以在模式中使用 x <- as.numeric(x)功能：

Mode <- function(x, na.rm =  TRUE) {
  x <- as.numeric(x)
  distribution <- hist(x, breaks = 50, plot = FALSE)
  distribution$mids[which.max(distribution$counts)]
}

然后您可以像以前那样继续操作，不会产生错误：

data_normalised <- lapply(mtcars[,-9:-12], function(x) {(x-Mode(x))/(sd(x))})

如果你想再次获得一个data.frame，你可以使用cbind():

data_normalised <- do.call("cbind", data_normalised)

你得到：

head(data_normalised)
           mpg         cyl      disp          hp        drat          wt        qsec          vs
[1,] 0.9540484 -1.09187321 0.6858229  0.03646289  1.54298263 -0.84827399 -0.35815351 -0.01984063
[2,] 0.9540484 -1.09187321 0.6858229  0.03646289  1.54298263 -0.58765969 -0.04476919 -0.01984063
[3,] 1.2527070 -2.21174317 0.2662607 -0.21148473  1.44946853 -1.15487905  0.84501845  1.96422286
[4,] 1.0204170 -1.09187321 1.4765365  0.03646289  0.00935141 -0.24017396  1.30949879  1.96422286
[5,] 0.5724290  0.02799675 2.2995240  0.98449790  0.14027115 -0.01022017 -0.04476919 -0.01984063
[6,] 0.4728762 -1.09187321 1.2102758 -0.03646289 -0.58913882  0.01022017  1.74599838  1.96422286

如何将数据帧切片传递给直方图函数以在 R 中进行模式归一化？

How to pass a dataframe slice to histogram function for mode normalisation in R?

r

normalization

histogram

slice

dataframe