删除数据框中的镜像线

Remove mirror lines in dataframe

我是 R 初学者,这个问题可能看起来很幼稚,但我尝试创建一个基于人群中家庭关系的网络。我正在使用 R 包 igraph。

正在准备我的数据,我将结束这种数据帧

Source    Target    Distance
Actr22510 Actr22509        1
Actr22511 Actr22509        1
Actr22509 Actr22510        1
Actr22511 Actr22510        1
Actr57033 Actr22510        1
Actr22509 Actr22511        1

我试图以此构建的网络是无向的。在这种情况下,行 Actr22510-Actr22509 和 Actr22509-Actr22510 是相同的。我不需要它们同时出现在我的数据框中。

是否可以去除这种镜像线?

非常感谢。

一个方案是对每一行的前两列进行排序,然后连接起来,然后检查这些键是否重复:

    df <-structure(list(Source = c("Actr22510", "Actr22511", "Actr22509", "Actr22511", "Actr57033", "Actr22509"), 
                    Target = c("Actr22509", "Actr22509", "Actr22510", "Actr22510", "Actr22510", "Actr22511"), 
                    Distance = c(1L, 1L, 1L, 1L, 1L, 1L)), 
                    .Names = c("Source","Target", "Distance"), class = "data.frame", row.names = c(NA,-6L))
df$key <- apply(df[,1:2],1,FUN=function(x)paste(sort(x),collapse=" "))
df[!duplicated(df$key),]
#Source    Target Distance                 key
#1 Actr22510 Actr22509        1 Actr22509 Actr22510
#2 Actr22511 Actr22509        1 Actr22509 Actr22511
#4 Actr22511 Actr22510        1 Actr22510 Actr22511
#5 Actr57033 Actr22510        1 Actr22510 Actr57033

由于您不喜欢使用 apply 函数,这可能更容易理解:

df$key <- ifelse(df$Source < df$Target,  paste(df$Source,df$Target), paste(df$Target,df$Source)

df[!duplicated(df$key),]

如果最终目标是创建无向 igraph 对象,可能您根本不需要删除这些行。只需:

library(igraph)

# Create an undirected graph, with edges between "Source" and "Target"
# Distance is kept as an edge attribute.
g <- graph.data.frame(df, directed=FALSE)

# Remove multiple edges (originally created from "mirror" lines)
g <- simplify(g, remove.multiple=TRUE, remove.loops=FALSE, edge.attr.comb="first")