在 R 中定义关节 self-information
define joint self-information in R
我想为 R 中的数据集计算 Normalized pointwise mutual information (npmi)。npmi 公式如下:
哪里
我有一个矩阵 dat
定义如下:
dat <- matrix(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1,
2, 3, 1, 1, 3, 2, 1, 3, 2, 2, 1, 3, 3, 2, 1, 2, 1, 2, 2, 2,
1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1), ncol = 4)
我可以这样计算分子:
apply(combn(1:ncol(dat), 2), 2, function(i) mutual_info(dat[, i], local = TRUE))
我怎样才能找到分母,将上面的两个相除并找到 mpmi
?
假设 h
是熵,mutual_info
是与 widyr::pairwise_pmi
相同的函数,你可以这样做:
library(tidyverse)
library(widyr)
library(DescTools)
dat <- matrix(c(
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1,
2, 3, 1, 1, 3, 2, 1, 3, 2, 2, 1, 3, 3, 2, 1, 2, 1, 2, 2, 2,
1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1
), ncol = 4)
dat <- as_tibble(dat)
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
#> Using compatibility `.name_repair`.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
dat %>%
colnames() %>%
combn(2) %>%
t() %>%
as_tibble() %>%
mutate(
pmi = list(V1, V2) %>% pmap(~ pairwise_pmi(tibble(from = dat[[.x]], to = dat[[.y]]), from, to))
) %>%
unnest() %>%
distinct(V1, V2, pmi) %>%
mutate(
h = list(V1, V2, pmi) %>% pmap_dbl(~ Entropy(dat[[.x]], dat[[.y]])),
npmi = pmi / h
)
#> Warning: `cols` is now required when using unnest().
#> Please use `cols = c(pmi)`
#> # A tibble: 6 × 5
#> V1 V2 pmi h npmi
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 V1 V2 0 1.32 0
#> 2 V1 V3 -0.0770 2.15 -0.0358
#> 3 V1 V4 0 1.68 0
#> 4 V2 V3 0 2.33 0
#> 5 V2 V4 0 1.80 0
#> 6 V3 V4 0 2.44 0
由 reprex package (v2.0.0)
于 2022-04-06 创建
例如其中 V3
是矩阵的第三列。
我想为 R 中的数据集计算 Normalized pointwise mutual information (npmi)。npmi 公式如下:
哪里
我有一个矩阵 dat
定义如下:
dat <- matrix(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1,
2, 3, 1, 1, 3, 2, 1, 3, 2, 2, 1, 3, 3, 2, 1, 2, 1, 2, 2, 2,
1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1), ncol = 4)
我可以这样计算分子:
apply(combn(1:ncol(dat), 2), 2, function(i) mutual_info(dat[, i], local = TRUE))
我怎样才能找到分母,将上面的两个相除并找到 mpmi
?
假设 h
是熵,mutual_info
是与 widyr::pairwise_pmi
相同的函数,你可以这样做:
library(tidyverse)
library(widyr)
library(DescTools)
dat <- matrix(c(
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1,
2, 3, 1, 1, 3, 2, 1, 3, 2, 2, 1, 3, 3, 2, 1, 2, 1, 2, 2, 2,
1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1
), ncol = 4)
dat <- as_tibble(dat)
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
#> Using compatibility `.name_repair`.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
dat %>%
colnames() %>%
combn(2) %>%
t() %>%
as_tibble() %>%
mutate(
pmi = list(V1, V2) %>% pmap(~ pairwise_pmi(tibble(from = dat[[.x]], to = dat[[.y]]), from, to))
) %>%
unnest() %>%
distinct(V1, V2, pmi) %>%
mutate(
h = list(V1, V2, pmi) %>% pmap_dbl(~ Entropy(dat[[.x]], dat[[.y]])),
npmi = pmi / h
)
#> Warning: `cols` is now required when using unnest().
#> Please use `cols = c(pmi)`
#> # A tibble: 6 × 5
#> V1 V2 pmi h npmi
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 V1 V2 0 1.32 0
#> 2 V1 V3 -0.0770 2.15 -0.0358
#> 3 V1 V4 0 1.68 0
#> 4 V2 V3 0 2.33 0
#> 5 V2 V4 0 1.80 0
#> 6 V3 V4 0 2.44 0
由 reprex package (v2.0.0)
于 2022-04-06 创建例如其中 V3
是矩阵的第三列。