PCA - 如何可视化所有变量都处于不同/相同的比例
PCA - how to visualize that all the variable are in different / same scale
我正在使用数据集 uscrime
,但这个问题适用于任何众所周知的数据集,例如 cars
。
谷歌搜索后,我发现对标准化我的数据非常有用,考虑到 PCA 根据原始变量的协方差矩阵找到新的方向,而协方差矩阵对变量的标准化很敏感。
尽管如此,我发现"It is not necessary to standardize the variables, if all the variables are in same scale."
为了标准化我使用函数的变量:
z_uscrime <- (uscrime - mean(uscrime)) / sd(uscrime)
在对我的数据进行标准化之前,如何检查所有变量是否在同一范围内?
证明我的观点,您可以根据需要多次标准化数据
library(tidyverse)
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
simple_recipe <- recipe(mpg ~ .,data = mtcars) %>%
step_center(everything()) %>%
step_scale(everything())
mtcars2 <- simple_recipe %>%
prep() %>%
juice()
simple_recipe2 <- recipe(mpg ~ .,data = mtcars2) %>%
step_center(everything()) %>%
step_scale(everything())
mtcars3 <- simple_recipe2 %>%
prep() %>%
juice()
all.equal(mtcars2,mtcars3)
#> [1] TRUE
mtcars2 %>%
summarise(across(everything(),.fns = list(mean = ~ mean(.x),sd = ~sd(.x)))) %>%
pivot_longer(everything(),names_pattern = "(.*)_(.*)",names_to = c("stat", ".value"))
#> # A tibble: 11 x 3
#> stat mean sd
#> <chr> <dbl> <dbl>
#> 1 cyl -1.47e-17 1
#> 2 disp -9.08e-17 1
#> 3 hp 1.04e-17 1
#> 4 drat -2.92e-16 1
#> 5 wt 4.68e-17 1.00
#> 6 qsec 5.30e-16 1
#> 7 vs 6.94e-18 1.00
#> 8 am 4.51e-17 1
#> 9 gear -3.47e-18 1.00
#> 10 carb 3.17e-17 1.00
#> 11 mpg 7.11e-17 1
mtcars3 %>%
summarise(across(everything(),.fns = list(mean = ~ mean(.x),sd = ~sd(.x)))) %>%
pivot_longer(everything(),names_pattern = "(.*)_(.*)",names_to = c("stat", ".value"))
#> # A tibble: 11 x 3
#> stat mean sd
#> <chr> <dbl> <dbl>
#> 1 cyl -1.17e-17 1
#> 2 disp -1.95e-17 1
#> 3 hp 9.54e-18 1
#> 4 drat 1.17e-17 1
#> 5 wt 3.26e-17 1
#> 6 qsec 1.37e-17 1
#> 7 vs 4.16e-17 1
#> 8 am 4.51e-17 1
#> 9 gear 0. 1
#> 10 carb 2.60e-18 1
#> 11 mpg 4.77e-18 1
由 reprex package (v0.3.0)
于 2020-06-07 创建
我正在使用数据集 uscrime
,但这个问题适用于任何众所周知的数据集,例如 cars
。
谷歌搜索后,我发现对标准化我的数据非常有用,考虑到 PCA 根据原始变量的协方差矩阵找到新的方向,而协方差矩阵对变量的标准化很敏感。
尽管如此,我发现"It is not necessary to standardize the variables, if all the variables are in same scale."
为了标准化我使用函数的变量:
z_uscrime <- (uscrime - mean(uscrime)) / sd(uscrime)
在对我的数据进行标准化之前,如何检查所有变量是否在同一范围内?
证明我的观点,您可以根据需要多次标准化数据
library(tidyverse)
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
simple_recipe <- recipe(mpg ~ .,data = mtcars) %>%
step_center(everything()) %>%
step_scale(everything())
mtcars2 <- simple_recipe %>%
prep() %>%
juice()
simple_recipe2 <- recipe(mpg ~ .,data = mtcars2) %>%
step_center(everything()) %>%
step_scale(everything())
mtcars3 <- simple_recipe2 %>%
prep() %>%
juice()
all.equal(mtcars2,mtcars3)
#> [1] TRUE
mtcars2 %>%
summarise(across(everything(),.fns = list(mean = ~ mean(.x),sd = ~sd(.x)))) %>%
pivot_longer(everything(),names_pattern = "(.*)_(.*)",names_to = c("stat", ".value"))
#> # A tibble: 11 x 3
#> stat mean sd
#> <chr> <dbl> <dbl>
#> 1 cyl -1.47e-17 1
#> 2 disp -9.08e-17 1
#> 3 hp 1.04e-17 1
#> 4 drat -2.92e-16 1
#> 5 wt 4.68e-17 1.00
#> 6 qsec 5.30e-16 1
#> 7 vs 6.94e-18 1.00
#> 8 am 4.51e-17 1
#> 9 gear -3.47e-18 1.00
#> 10 carb 3.17e-17 1.00
#> 11 mpg 7.11e-17 1
mtcars3 %>%
summarise(across(everything(),.fns = list(mean = ~ mean(.x),sd = ~sd(.x)))) %>%
pivot_longer(everything(),names_pattern = "(.*)_(.*)",names_to = c("stat", ".value"))
#> # A tibble: 11 x 3
#> stat mean sd
#> <chr> <dbl> <dbl>
#> 1 cyl -1.17e-17 1
#> 2 disp -1.95e-17 1
#> 3 hp 9.54e-18 1
#> 4 drat 1.17e-17 1
#> 5 wt 3.26e-17 1
#> 6 qsec 1.37e-17 1
#> 7 vs 4.16e-17 1
#> 8 am 4.51e-17 1
#> 9 gear 0. 1
#> 10 carb 2.60e-18 1
#> 11 mpg 4.77e-18 1
由 reprex package (v0.3.0)
于 2020-06-07 创建