删除数据框中的镜像线
Remove mirror lines in dataframe
我是 R 初学者,这个问题可能看起来很幼稚,但我尝试创建一个基于人群中家庭关系的网络。我正在使用 R 包 igraph。
正在准备我的数据,我将结束这种数据帧
Source Target Distance
Actr22510 Actr22509 1
Actr22511 Actr22509 1
Actr22509 Actr22510 1
Actr22511 Actr22510 1
Actr57033 Actr22510 1
Actr22509 Actr22511 1
我试图以此构建的网络是无向的。在这种情况下,行 Actr22510-Actr22509 和 Actr22509-Actr22510 是相同的。我不需要它们同时出现在我的数据框中。
是否可以去除这种镜像线?
非常感谢。
一个方案是对每一行的前两列进行排序,然后连接起来,然后检查这些键是否重复:
df <-structure(list(Source = c("Actr22510", "Actr22511", "Actr22509", "Actr22511", "Actr57033", "Actr22509"),
Target = c("Actr22509", "Actr22509", "Actr22510", "Actr22510", "Actr22510", "Actr22511"),
Distance = c(1L, 1L, 1L, 1L, 1L, 1L)),
.Names = c("Source","Target", "Distance"), class = "data.frame", row.names = c(NA,-6L))
df$key <- apply(df[,1:2],1,FUN=function(x)paste(sort(x),collapse=" "))
df[!duplicated(df$key),]
#Source Target Distance key
#1 Actr22510 Actr22509 1 Actr22509 Actr22510
#2 Actr22511 Actr22509 1 Actr22509 Actr22511
#4 Actr22511 Actr22510 1 Actr22510 Actr22511
#5 Actr57033 Actr22510 1 Actr22510 Actr57033
由于您不喜欢使用 apply
函数,这可能更容易理解:
df$key <- ifelse(df$Source < df$Target, paste(df$Source,df$Target), paste(df$Target,df$Source)
df[!duplicated(df$key),]
如果最终目标是创建无向 igraph 对象,可能您根本不需要删除这些行。只需:
library(igraph)
# Create an undirected graph, with edges between "Source" and "Target"
# Distance is kept as an edge attribute.
g <- graph.data.frame(df, directed=FALSE)
# Remove multiple edges (originally created from "mirror" lines)
g <- simplify(g, remove.multiple=TRUE, remove.loops=FALSE, edge.attr.comb="first")
我是 R 初学者,这个问题可能看起来很幼稚,但我尝试创建一个基于人群中家庭关系的网络。我正在使用 R 包 igraph。
正在准备我的数据,我将结束这种数据帧
Source Target Distance
Actr22510 Actr22509 1
Actr22511 Actr22509 1
Actr22509 Actr22510 1
Actr22511 Actr22510 1
Actr57033 Actr22510 1
Actr22509 Actr22511 1
我试图以此构建的网络是无向的。在这种情况下,行 Actr22510-Actr22509 和 Actr22509-Actr22510 是相同的。我不需要它们同时出现在我的数据框中。
是否可以去除这种镜像线?
非常感谢。
一个方案是对每一行的前两列进行排序,然后连接起来,然后检查这些键是否重复:
df <-structure(list(Source = c("Actr22510", "Actr22511", "Actr22509", "Actr22511", "Actr57033", "Actr22509"),
Target = c("Actr22509", "Actr22509", "Actr22510", "Actr22510", "Actr22510", "Actr22511"),
Distance = c(1L, 1L, 1L, 1L, 1L, 1L)),
.Names = c("Source","Target", "Distance"), class = "data.frame", row.names = c(NA,-6L))
df$key <- apply(df[,1:2],1,FUN=function(x)paste(sort(x),collapse=" "))
df[!duplicated(df$key),]
#Source Target Distance key
#1 Actr22510 Actr22509 1 Actr22509 Actr22510
#2 Actr22511 Actr22509 1 Actr22509 Actr22511
#4 Actr22511 Actr22510 1 Actr22510 Actr22511
#5 Actr57033 Actr22510 1 Actr22510 Actr57033
由于您不喜欢使用 apply
函数,这可能更容易理解:
df$key <- ifelse(df$Source < df$Target, paste(df$Source,df$Target), paste(df$Target,df$Source)
df[!duplicated(df$key),]
如果最终目标是创建无向 igraph 对象,可能您根本不需要删除这些行。只需:
library(igraph)
# Create an undirected graph, with edges between "Source" and "Target"
# Distance is kept as an edge attribute.
g <- graph.data.frame(df, directed=FALSE)
# Remove multiple edges (originally created from "mirror" lines)
g <- simplify(g, remove.multiple=TRUE, remove.loops=FALSE, edge.attr.comb="first")