大尺寸 matrices/data 帧的系数

Coefficients for large dimension matrices/data frames

我有一个 70(行)x 64000(列)的数据框。

我想为我的数据框找到列和行之间的相关性,并根据它们的绝对值对它们进行排序。但是当我使用 coef() 函数时,我得到 NULL:

> coef(expressions70)
NULL

有什么方法可以从 psych 包中获得类似于 paira.panels() 输出的系数吗?或者其他显示系数的方法?

不,没有从 pairs.panels 创建的对象。如果将它设置为一个对象,您会看到该对象的值为 NULL。但是,您仍然可以通过多种方式查看它。 (虽然考虑到 Anscombe 的平方,但我还是建议您不要按面值取 rho 值。)

这两个选项都会创建一个命名向量作为输出。名称是两个相关字段。输出是 rho 值。

If all your fields are numeric or if you know exactly what columns are numeric, use this first option. If you have dates, character, and factor fields mixed in the columns, then use the second option.

第一个选项:

library(funModeling)
library(tidyverse)
library(RcppAlgos)

# create all combinations
tellMe <- comboGeneral(names(iris[,1:4]),
                       2, T) %>% 
  as.data.frame()
showMe <- map(1:nrow(tellMe),
              ~setNames(
                cor(iris[,tellMe[.x,1]], 
                    iris[,tellMe[.x,2]],
                    "everything", "pearson"),
                paste0(tellMe[.x, ], collapse = "-"))
              ) %>% 
  unlist() %>% sort(decreasing = T)
#   Sepal.Width-Sepal.Width Sepal.Length-Sepal.Length 
#                 1.0000000                 1.0000000 
# Petal.Length-Petal.Length   Petal.Width-Petal.Width 
#                 1.0000000                 1.0000000 
#  Petal.Length-Petal.Width Sepal.Length-Petal.Length 
#                 0.9628654                 0.8717538 
#  Sepal.Length-Petal.Width  Sepal.Length-Sepal.Width 
#                 0.8179411                -0.1175698 
#   Sepal.Width-Petal.Width  Sepal.Width-Petal.Length 
#                -0.3661259                -0.4284401 

第二个选项

首先确定哪些字段是整数或数字,然后遵循与第一个相同的路径。

I have to say, I started with select(where()) but where is a ::: for tidyselect now...so I went with an alternative method. If this doesn't make anything to you, just ignore this comment.

# if some variables are not numeric...
# apparently 'where' isn't in tidyselect anymore
fields <- df_status(iris) %>% 
  filter(type == "integer" | type == "numeric") %>% 
  select(variable) %>% 
  unlist(use.names = F)
# [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  

# find all possible combinations (with no repeats)
giveMe <- comboGeneral(fields, 2, T) %>% 
  as.data.frame()

itsShown <- map(1:nrow(giveMe),
                ~setNames(
                  cor(iris[,giveMe[.x,1]], 
                      iris[,giveMe[.x,2]],
                      "everything", "pearson"),
                  paste0(giveMe[.x, ], collapse = "-"))
                ) %>% 
  unlist() %>% sort(decreasing = T)
#   Sepal.Width-Sepal.Width Sepal.Length-Sepal.Length 
#                 1.0000000                 1.0000000 
# Petal.Length-Petal.Length   Petal.Width-Petal.Width 
#                 1.0000000                 1.0000000 
#  Petal.Length-Petal.Width Sepal.Length-Petal.Length 
#                 0.9628654                 0.8717538 
#  Sepal.Length-Petal.Width  Sepal.Length-Sepal.Width 
#                 0.8179411                -0.1175698 
#   Sepal.Width-Petal.Width  Sepal.Width-Petal.Length 
#                -0.3661259                -0.4284401