从数据框创建边缘列表
creating a edgelist from a dataframe
我正试图从中得到:
session location sequence weight INDIVIDUAL action
a1 texas 1 10 john Z1
a1 texas 2 5 peter Z2
a1 texas 3 3 ben Z1
a1 texas 4 5 peter Z5
a2 calif 1 25 esther Z3
a2 calif 2 5 peggy Z2
a2 calif 3 10 greg Z5
对此:
INDIVIDUAL1 INDIVIDUAL2 weight
john peter 10
john ben 10
peter john 5
peter ben 5
ben john 3
ben peter 3
peter john 5
peter ben 5
我正在探索多种选择,包括使用 for 循环,但我有点担心,随着我的数据集变得非常大,这可能会花费太长时间。非常感谢任何指点!
谢谢!
这应该可以帮助您入门
您的数据
df <- read.table(text="session location sequence weight INDIVIDUAL action
a1 texas 1 10 john Z1
a1 texas 2 5 peter Z2
a1 texas 3 3 ben Z1
a1 texas 4 5 peter Z5
a2 calif 1 25 esther Z3
a2 calif 2 5 peggy Z2
a2 calif 3 10 greg Z5", header=TRUE, stringsAsFactors=FALSE)
library(tidyverse)
ans <- df %>%
group_by(session, location) %>%
nest(INDIVIDUAL, weight) %>%
mutate(data = map(data, ~cbind(expand.grid(.x$INDIVIDUAL, .x$INDIVIDUAL), expand.grid(.x$weight, .x$weight)) %>% setNames(paste0("V", 1:4)) )) %>%
unnest() %>%
filter(V1 != V2) %>%
select(-V4) %>%
arrange(session, V1)
# A tibble: 16 x 5
# session location V1 V2 V3
# <chr> <chr> <chr> <chr> <int>
# 1 a1 texas ben john 3
# 2 a1 texas ben peter 3
# 3 a1 texas ben peter 3
# 4 a1 texas john peter 10
# 5 a1 texas john ben 10
# 6 a1 texas john peter 10
# 7 a1 texas peter john 5
# 8 a1 texas peter john 5
# 9 a1 texas peter ben 5
# 10 a1 texas peter ben 5
# 11 a2 calif esther peggy 25
# 12 a2 calif esther greg 25
# 13 a2 calif greg esther 10
# 14 a2 calif greg peggy 10
# 15 a2 calif peggy esther 5
# 16 a2 calif peggy greg 5
这是一种使用自联接的简单方法。 sequence
和 session
列的删除留给你。
library(dplyr)
df %>% select(session, weight, sequence, INDIVIDUAL) %>%
inner_join(., select(., session, INDIVIDUAL), by = "session") %>%
rename(INDIVIDUAL1 = INDIVIDUAL.x, INDIVIDUAL2 = INDIVIDUAL.y) %>%
filter(INDIVIDUAL1 != INDIVIDUAL2) %>%
unique %>%
arrange(session, sequence)
# session weight sequence INDIVIDUAL1 INDIVIDUAL2
# 1 a1 10 1 john peter
# 2 a1 10 1 john ben
# 3 a1 5 2 peter john
# 4 a1 5 2 peter ben
# 5 a1 3 3 ben john
# 6 a1 3 3 ben peter
# 7 a1 5 4 peter john
# 8 a1 5 4 peter ben
# 9 a2 25 1 esther peggy
# 10 a2 25 1 esther greg
# 11 a2 5 2 peggy esther
# 12 a2 5 2 peggy greg
# 13 a2 10 3 greg esther
# 14 a2 10 3 greg peggy
我正试图从中得到:
session location sequence weight INDIVIDUAL action
a1 texas 1 10 john Z1
a1 texas 2 5 peter Z2
a1 texas 3 3 ben Z1
a1 texas 4 5 peter Z5
a2 calif 1 25 esther Z3
a2 calif 2 5 peggy Z2
a2 calif 3 10 greg Z5
对此:
INDIVIDUAL1 INDIVIDUAL2 weight
john peter 10
john ben 10
peter john 5
peter ben 5
ben john 3
ben peter 3
peter john 5
peter ben 5
我正在探索多种选择,包括使用 for 循环,但我有点担心,随着我的数据集变得非常大,这可能会花费太长时间。非常感谢任何指点!
谢谢!
这应该可以帮助您入门
您的数据
df <- read.table(text="session location sequence weight INDIVIDUAL action
a1 texas 1 10 john Z1
a1 texas 2 5 peter Z2
a1 texas 3 3 ben Z1
a1 texas 4 5 peter Z5
a2 calif 1 25 esther Z3
a2 calif 2 5 peggy Z2
a2 calif 3 10 greg Z5", header=TRUE, stringsAsFactors=FALSE)
library(tidyverse)
ans <- df %>%
group_by(session, location) %>%
nest(INDIVIDUAL, weight) %>%
mutate(data = map(data, ~cbind(expand.grid(.x$INDIVIDUAL, .x$INDIVIDUAL), expand.grid(.x$weight, .x$weight)) %>% setNames(paste0("V", 1:4)) )) %>%
unnest() %>%
filter(V1 != V2) %>%
select(-V4) %>%
arrange(session, V1)
# A tibble: 16 x 5
# session location V1 V2 V3
# <chr> <chr> <chr> <chr> <int>
# 1 a1 texas ben john 3
# 2 a1 texas ben peter 3
# 3 a1 texas ben peter 3
# 4 a1 texas john peter 10
# 5 a1 texas john ben 10
# 6 a1 texas john peter 10
# 7 a1 texas peter john 5
# 8 a1 texas peter john 5
# 9 a1 texas peter ben 5
# 10 a1 texas peter ben 5
# 11 a2 calif esther peggy 25
# 12 a2 calif esther greg 25
# 13 a2 calif greg esther 10
# 14 a2 calif greg peggy 10
# 15 a2 calif peggy esther 5
# 16 a2 calif peggy greg 5
这是一种使用自联接的简单方法。 sequence
和 session
列的删除留给你。
library(dplyr)
df %>% select(session, weight, sequence, INDIVIDUAL) %>%
inner_join(., select(., session, INDIVIDUAL), by = "session") %>%
rename(INDIVIDUAL1 = INDIVIDUAL.x, INDIVIDUAL2 = INDIVIDUAL.y) %>%
filter(INDIVIDUAL1 != INDIVIDUAL2) %>%
unique %>%
arrange(session, sequence)
# session weight sequence INDIVIDUAL1 INDIVIDUAL2
# 1 a1 10 1 john peter
# 2 a1 10 1 john ben
# 3 a1 5 2 peter john
# 4 a1 5 2 peter ben
# 5 a1 3 3 ben john
# 6 a1 3 3 ben peter
# 7 a1 5 4 peter john
# 8 a1 5 4 peter ben
# 9 a2 25 1 esther peggy
# 10 a2 25 1 esther greg
# 11 a2 5 2 peggy esther
# 12 a2 5 2 peggy greg
# 13 a2 10 3 greg esther
# 14 a2 10 3 greg peggy