在 R 中定义关节 self-information

define joint self-information in R

我想为 R 中的数据集计算 Normalized pointwise mutual information (npmi)。npmi 公式如下:

哪里

我有一个矩阵 dat 定义如下:

dat <- matrix(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
                0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1,
                2, 3, 1, 1, 3, 2, 1, 3, 2, 2, 1, 3, 3, 2, 1, 2, 1, 2, 2, 2,
                1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1), ncol = 4)

我可以这样计算分子:

apply(combn(1:ncol(dat), 2), 2, function(i) mutual_info(dat[, i], local = TRUE))

我怎样才能找到分母,将上面的两个相除并找到 mpmi

假设 h 是熵,mutual_info 是与 widyr::pairwise_pmi 相同的函数,你可以这样做:

library(tidyverse)
library(widyr)
library(DescTools)

dat <- matrix(c(
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
  0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1,
  2, 3, 1, 1, 3, 2, 1, 3, 2, 2, 1, 3, 3, 2, 1, 2, 1, 2, 2, 2,
  1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1
), ncol = 4)

dat <- as_tibble(dat)
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
#> Using compatibility `.name_repair`.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

dat %>%
  colnames() %>%
  combn(2) %>%
  t() %>%
  as_tibble() %>%
  mutate(
    pmi = list(V1, V2) %>% pmap(~ pairwise_pmi(tibble(from = dat[[.x]], to = dat[[.y]]), from, to))
  ) %>%
  unnest() %>%
  distinct(V1, V2, pmi) %>%
  mutate(
    h = list(V1, V2, pmi) %>% pmap_dbl(~ Entropy(dat[[.x]], dat[[.y]])),
    npmi = pmi / h
  )
#> Warning: `cols` is now required when using unnest().
#> Please use `cols = c(pmi)`
#> # A tibble: 6 × 5
#>   V1    V2        pmi     h    npmi
#>   <chr> <chr>   <dbl> <dbl>   <dbl>
#> 1 V1    V2     0       1.32  0     
#> 2 V1    V3    -0.0770  2.15 -0.0358
#> 3 V1    V4     0       1.68  0     
#> 4 V2    V3     0       2.33  0     
#> 5 V2    V4     0       1.80  0     
#> 6 V3    V4     0       2.44  0

reprex package (v2.0.0)

于 2022-04-06 创建

例如其中 V3 是矩阵的第三列。