循环 df 并检索链接到其他列中唯一值的唯一值
Loop over df and retrieve unique values linked to unique values in other column
我在 excel 中对标签进行了子分类和分类,但我想使其可重现,因此我想将其转换为 R 代码。
我有一个包含 631 行的 df,其中前 15 行看起来像这样。
IV_label Subcategory Category
<chr> <chr> <chr>
1 light conditions time of day exogenous
2 vital status victim characteristics human involvement
3 road type road type exogenous
4 reserve density workload police discretion
5 road type road type exogenous
6 surface type road type exogenous
7 surface characteristic road type exogenous
8 light conditions time of day exogenous
9 light conditions time of day exogenous
10 weather weather type exogenous
11 weather weather type exogenous
12 weather weather type exogenous
13 day of the week day of the week exogenous
14 amount of lanes road type exogenous
15 amount of lanes road type exogenous
我希望能够将以下内容添加到我的 R 代码中,而无需自己构建列表:
time of day <- list(light conditions, ...)
victim characteristics <- list(vital status, ...)
road type <- list(road type, surface type, surface characteristics, amount of lanes, ...) (# notice road type is include only once!)
workload <- list(reserve density, ...)
weather type <- list(weather, ...)
day of the week <- list(day of the week, ...)
exogenous <- list(time of day, road type, weather type, day of the week)
human involvement <- list(victim characteristics)
police discretion <- list(workload)
我知道我需要自己对这部分进行样板化:
time of day <- list(
victim characteristics <- list(
road type <- list(
workload <- list(
weather type <- list(
day of the week <- list(
exogenous <- list(
human involvement <- list(
police discretion <- list(
但我希望能够从控制台复制唯一值并将它们传递到上面的样板文件中。
这里我考虑 edge 出现在同一行、两个连续列中的任何一对术语。我正在使用邻接矩阵 adj
来跟踪边,然后将图形重建为命名列表:
library(purrr)
df <- data.frame(IV_label = c(
"light conditions","vital status","road type",
"reserve density","road type","surface type",
"surface characteristic","light conditions","light conditions",
"weather","weather","weather",
"day of the week","amount of lanes","amount of lanes"),
Subcategory = c(
"time of day","victim characteristics","road type",
"workload","road type","road type",
"road type","time of day","time of day",
"weather type","weather type","weather type",
"day of the week","road type","road type"),
Category = c(
"exogenous","human involvement","exogenous",
"police discretion","exogenous","exogenous",
"exogenous","exogenous","exogenous",
"exogenous","exogenous","exogenous",
"exogenous","exogenous","exogenous"))
names <- c("IV_label", "Subcategory", "Category") |>
purrr::map(~pull(df, .x)) |>
purrr::reduce(union)
## adjacency matrix
adj <- matrix(0,
nrow = length(names),
ncol = length(names),
dimnames = list(names, names))
adj[cbind(df[,2], df[,1])] <- 1
adj[cbind(df[,3], df[,2])] <- 1
setNames(asplit(adj, 1),names) |>
purrr::map(~names[which(.x == 1)]) |>
purrr::keep(~length(.x) > 0)
输出:
$`road type`
[1] "road type" "surface type" "surface characteristic"
[4] "amount of lanes"
$`day of the week`
[1] "day of the week"
$`time of day`
[1] "light conditions"
$`victim characteristics`
[1] "vital status"
$workload
[1] "reserve density"
$`weather type`
[1] "weather"
$exogenous
[1] "road type" "day of the week" "time of day" "weather type"
$`human involvement`
[1] "victim characteristics"
$`police discretion`
[1] "workload"
您可能想要取消设置 adj
的对角线以避免自引用边:
adj[row(adj) == col(adj)] <- 0
setNames(asplit(adj, 1),names) |>
purrr::map(~names[which(.x == 1)]) |>
purrr::keep(~length(.x) > 0)
输出:
$`road type`
[1] "surface type" "surface characteristic" "amount of lanes"
$`time of day`
[1] "light conditions"
$`victim characteristics`
[1] "vital status"
$workload
[1] "reserve density"
$`weather type`
[1] "weather"
$exogenous
[1] "road type" "day of the week" "time of day" "weather type"
$`human involvement`
[1] "victim characteristics"
$`police discretion`
[1] "workload"
我在 excel 中对标签进行了子分类和分类,但我想使其可重现,因此我想将其转换为 R 代码。
我有一个包含 631 行的 df,其中前 15 行看起来像这样。
IV_label Subcategory Category
<chr> <chr> <chr>
1 light conditions time of day exogenous
2 vital status victim characteristics human involvement
3 road type road type exogenous
4 reserve density workload police discretion
5 road type road type exogenous
6 surface type road type exogenous
7 surface characteristic road type exogenous
8 light conditions time of day exogenous
9 light conditions time of day exogenous
10 weather weather type exogenous
11 weather weather type exogenous
12 weather weather type exogenous
13 day of the week day of the week exogenous
14 amount of lanes road type exogenous
15 amount of lanes road type exogenous
我希望能够将以下内容添加到我的 R 代码中,而无需自己构建列表:
time of day <- list(light conditions, ...)
victim characteristics <- list(vital status, ...)
road type <- list(road type, surface type, surface characteristics, amount of lanes, ...) (# notice road type is include only once!)
workload <- list(reserve density, ...)
weather type <- list(weather, ...)
day of the week <- list(day of the week, ...)
exogenous <- list(time of day, road type, weather type, day of the week)
human involvement <- list(victim characteristics)
police discretion <- list(workload)
我知道我需要自己对这部分进行样板化:
time of day <- list(
victim characteristics <- list(
road type <- list(
workload <- list(
weather type <- list(
day of the week <- list(
exogenous <- list(
human involvement <- list(
police discretion <- list(
但我希望能够从控制台复制唯一值并将它们传递到上面的样板文件中。
这里我考虑 edge 出现在同一行、两个连续列中的任何一对术语。我正在使用邻接矩阵 adj
来跟踪边,然后将图形重建为命名列表:
library(purrr)
df <- data.frame(IV_label = c(
"light conditions","vital status","road type",
"reserve density","road type","surface type",
"surface characteristic","light conditions","light conditions",
"weather","weather","weather",
"day of the week","amount of lanes","amount of lanes"),
Subcategory = c(
"time of day","victim characteristics","road type",
"workload","road type","road type",
"road type","time of day","time of day",
"weather type","weather type","weather type",
"day of the week","road type","road type"),
Category = c(
"exogenous","human involvement","exogenous",
"police discretion","exogenous","exogenous",
"exogenous","exogenous","exogenous",
"exogenous","exogenous","exogenous",
"exogenous","exogenous","exogenous"))
names <- c("IV_label", "Subcategory", "Category") |>
purrr::map(~pull(df, .x)) |>
purrr::reduce(union)
## adjacency matrix
adj <- matrix(0,
nrow = length(names),
ncol = length(names),
dimnames = list(names, names))
adj[cbind(df[,2], df[,1])] <- 1
adj[cbind(df[,3], df[,2])] <- 1
setNames(asplit(adj, 1),names) |>
purrr::map(~names[which(.x == 1)]) |>
purrr::keep(~length(.x) > 0)
输出:
$`road type`
[1] "road type" "surface type" "surface characteristic"
[4] "amount of lanes"
$`day of the week`
[1] "day of the week"
$`time of day`
[1] "light conditions"
$`victim characteristics`
[1] "vital status"
$workload
[1] "reserve density"
$`weather type`
[1] "weather"
$exogenous
[1] "road type" "day of the week" "time of day" "weather type"
$`human involvement`
[1] "victim characteristics"
$`police discretion`
[1] "workload"
您可能想要取消设置 adj
的对角线以避免自引用边:
adj[row(adj) == col(adj)] <- 0
setNames(asplit(adj, 1),names) |>
purrr::map(~names[which(.x == 1)]) |>
purrr::keep(~length(.x) > 0)
输出:
$`road type`
[1] "surface type" "surface characteristic" "amount of lanes"
$`time of day`
[1] "light conditions"
$`victim characteristics`
[1] "vital status"
$workload
[1] "reserve density"
$`weather type`
[1] "weather"
$exogenous
[1] "road type" "day of the week" "time of day" "weather type"
$`human involvement`
[1] "victim characteristics"
$`police discretion`
[1] "workload"