R:circlize circos plot - 如何绘制扇区之间的不连接区域,重叠最小
R: circlize circos plot - how to plot unconnected areas between sectors with minimal overlap
我有一个数据框,其中包含 4 组患者和细胞类型之间的共同特征。我有很多不同的功能,但共享的功能(出现在不止一组中)只是少数。
我想制作一个 circos 图来反映患者组和细胞类型之间共享特征之间的少数联系,同时给出每个组中有多少非共享特征的想法。
我的想法是,它应该是一个包含 4 个扇区(每组患者和细胞类型一个)的图,它们之间有一些连接。每个扇区大小应该反映组中特征的总数,并且这个区域的大部分应该不连接到其他组,而是空的。
这是我目前所拥有的,但我不想要专用于每个功能的扇区,只想要每组患者和细胞类型。
MWE:
library(circlize)
patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
dat
chordDiagram(as.data.frame(dat), transparency = 0.5)
编辑!!
@m-dz 在他的回答中显示的实际上是我正在寻找的格式,4 种不同的 patient/cell.type 组合的 4 个扇区,仅显示连接,而非连接功能,虽然未显示,但应占扇区的大小。
但是,我意识到我的场景比上面 MWE 中的场景更复杂。
一个特征被认为出现在2个patient/cell.type组中,不仅当它在2组中相同,而且当它是similar...(高于阈值的序列同一性)。这样,我有冗余...
patient1-cell1 中的特征 A 可以连接到 patient2-cell1 中的特征 A,也可以连接到特征 B...对于 patient1-cell1,特征 A 应该只计算一次(唯一计数),并扩展为 2 patient2-cell1 中的不同特征。
请参阅下面的示例,以更准确地了解我的实际数据,看看是否可以使用此示例获得最终的 circos 图!谢谢!!
##MWE
#NON OVERLAPPING SETS!
#1: non-shared features
nonshared <- data.frame(patient=c(rep("pat1",20), rep("pat2",10)), cell.type=c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4)), feature=paste("a",1:30,sep=''))
nonshared
#2: features shared between cell types within same patient
sharedcells <- data.frame(patient=c(rep("pat1",3), rep("pat2",4)), cell.types=c(rep("cell1||cell2",3),rep("cell1||cell2",4)), features=c("b1||b1","b1||b1","b1||b1","b2||b2","b3||b3","b4||b4","b4||b5"))
sharedcells
#3: features shared between patients within same cell types
sharedpats <- data.frame(patients=c(rep("pat1||pat2",2), rep("pat1||pat2",6)), cell.type=c(rep("cell1",2),rep("cell2",6)), features=c("c1||c1","c2||c1","c3||c3","c3||c4","c3||c5","c6||c5","c7||c7","c8||c8"))
sharedpats
#4: features shared between patients and cell types
#4.1: shared across pat1-cell1, pat1-cell2, pat2-cell1, pat2-cell2
sharedall1 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1||pat2-cell2",4)), features=c("d1||d1||d1||d1","d2||d2||d2||d3","d4||d4||d3||d3","d5||d5||d5||d5"))
#4.2: shared across pat1-cell1, pat1-cell2, pat2-cell1
sharedall2 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1",2)), features=c("d6||d6||d6","d7||d7||d7"))
#4.3: shared across pat1-cell1, pat1-cell2, pat2-cell2
sharedall3 <- data.frame(both="pat1-cell1||pat1-cell2||pat2-cell2", features="d8||d8||d9")
#4.4: shared across pat1-cell1, pat2-cell1, pat2-cell2
sharedall4 <- data.frame(both="pat1-cell1||pat2-cell1||pat2-cell2", features="d10||d10||d9")
#4.5: shared across pat1-cell2, pat2-cell1, pat2-cell2
sharedall5 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1||pat2-cell2",3)), features=c("d11||d11||d11","d12||d13||d13","d12||d14||d14"))
#4.6: shared across pat1-cell1, pat2-cell2
sharedall6 <- data.frame()
#4.7: shared across pat1-cell2, pat2-cell1
sharedall7 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1",2)), features=c("d15||d16","d17||d17"))
sharedall <- rbind(sharedall1, sharedall2, sharedall3, sharedall4, sharedall5, sharedall6, sharedall7)
sharedall
#you see there might be overlaps between the different subsets of sharedall, but not between sharedall, sharedparts, sharedcells, and nonshared
#I NEED A CIRCOS PLOT THAT SHOWS ALL THE CONNECTIONS. THE NON-CONNECTED (nonshared) FEATURES SHOULD NOT BE SHOWN, BUT THE SHOULD COUNT TO THE SIZE OF THE SECTOR (CORRESPONDING TO A PATIENT-CELL COMBINATION)
#THE FEATURES SHOULD BE COUNT UNIQUELY, SO IF THERE ARE ENTRIES LIKE:
#3 pat1||pat2 cell2 c3||c3
#4 pat1||pat2 cell2 c3||c4
#5 pat1||pat2 cell2 c3||c5
#THE FEATURE c3 SHOULD BE COUNT ONCE FOR pat1, AND EXPAND TO 3 DIFFERENT FEATURES IN pat2
关于预期结果的附注:目的是创建一个简单显示共享特征数量的图,忽略单个特征(下面的第一个图)或共享特征重叠(例如,在第二个图上看起来是一样的)特征在所有组之间共享,从第一个图来看并非如此,但这里重要的是组之间共享特征的比例)。
下面的代码生成以下两个图(图 1 留作参考):
所有个人特征
独特和共有特征的简单计数
其中一个应该符合预期。
# Prep. data --------------------------------------------------------------
nonshared <- data.frame(patient=c(rep("pat1",20), rep("pat2",10)), cell.type=c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4)), feature=paste("a",1:30,sep=''))
sharedcells <- data.frame(patient=c(rep("pat1",3), rep("pat2",4)), cell.types=c(rep("cell1||cell2",3),rep("cell1||cell2",4)), features=c("b1||b1","b1||b1","b1||b1","b2||b2","b3||b3","b4||b4","b4||b5"))
sharedpats <- data.frame(patients=c(rep("pat1||pat2",2), rep("pat1||pat2",6)), cell.type=c(rep("cell1",2),rep("cell2",6)), features=c("c1||c1","c2||c1","c3||c3","c3||c4","c3||c5","c6||c5","c7||c7","c8||c8"))
sharedall1 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1||pat2-cell2",4)), features=c("d1||d1||d1||d1","d2||d2||d2||d3","d4||d4||d3||d3","d5||d5||d5||d5"))
sharedall2 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1",2)), features=c("d6||d6||d6","d7||d7||d7"))
sharedall3 <- data.frame(both="pat1-cell1||pat1-cell2||pat2-cell2", features="d8||d8||d9")
sharedall4 <- data.frame(both="pat1-cell1||pat2-cell1||pat2-cell2", features="d10||d10||d9")
sharedall5 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1||pat2-cell2",3)), features=c("d11||d11||d11","d12||d13||d13","d12||d14||d14"))
sharedall6 <- data.frame()
sharedall7 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1",2)), features=c("d15||d16","d17||d17"))
sharedall <- rbind(sharedall1, sharedall2, sharedall3, sharedall4, sharedall5, sharedall6, sharedall7)
#I NEED A CIRCOS PLOT THAT SHOWS ALL THE CONNECTIONS. THE NON-CONNECTED (nonshared) FEATURES SHOULD NOT BE SHOWN, BUT THE SHOULD COUNT TO THE SIZE OF THE SECTOR (CORRESPONDING TO A PATIENT-CELL COMBINATION)
#THE FEATURES SHOULD BE COUNT UNIQUELY, SO IF THERE ARE ENTRIES LIKE:
#3 pat1||pat2 cell2 c3||c3
#4 pat1||pat2 cell2 c3||c4
#5 pat1||pat2 cell2 c3||c5
#THE FEATURE c3 SHOULD BE COUNT ONCE FOR pat1, AND EXPAND TO 3 DIFFERENT FEATURES IN pat2
# Start -------------------------------------------------------------------
library(circlize)
library(data.table)
library(magrittr)
library(stringr)
library(RColorBrewer)
# Split and pad with 0 ----------------------------------------------------
fun <- function(x) unlist(tstrsplit(x, split = '||', fixed = TRUE))
nonshared %>% setDT()
sharedcells %>% setDT()
sharedpats %>% setDT()
sharedall %>% setDT()
nonshared <- nonshared[, .(group = paste(patient, cell.type, sep = '-'), feature)][, feature := paste0('a', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
sharedcells <- sharedcells[, lapply(.SD, fun), by = 1:nrow(sharedcells)][, .(group = paste(patient, cell.types, sep = '-'), feature = features)][, feature := paste0('b', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
sharedpats <- sharedpats[, lapply(.SD, fun), by = 1:nrow(sharedpats)][, .(group = paste(patients, cell.type, sep = '-'), feature = features)][, feature := paste0('c', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
sharedall <- sharedall[, lapply(.SD, fun), by = 1:nrow(sharedall)][, .(group = both, feature = features)][, feature := paste0('d', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
dt_split <- rbindlist(
list(
nonshared,
sharedcells,
sharedpats,
sharedall
)
)
# Set key and self join to find shared features ---------------------------
setkey(dt_split, feature)
dt_join <- dt_split[dt_split, .(group, i.group, feature), allow.cartesian = TRUE] %>%
.[group != i.group, ]
# Create a "sorted key" ---------------------------------------------------
# key := paste(sort(.SD)...
# To leave only unique combinations of groups and features
dt_join <-
dt_join[,
key := paste(sort(.SD), collapse = '|'),
by = 1:nrow(dt_join),
.SDcols = c('group', 'i.group')
] %>%
setorder(feature, key) %>%
unique(by = c('key', 'feature')) %>%
.[, .(
group_from = i.group,
group_to = group,
feature = feature)]
# Rename and key ----------------------------------------------------------
dt_split %>% setnames(old = 'group', new = 'group_from') %>% setkey(group_from, feature)
dt_join %>% setkey(group_from, feature)
# Individual features -----------------------------------------------------
# Features without connections --------------------------------------------
dt_singles <- dt_split[, .(group_from, group_to = group_from, feature)] %>%
.[, N := .N, by = feature] %>%
.[!(N > 1 & group_from == group_to), !c('N')]
# Bind all, add some columns etc. -----------------------------------------
dt_bind <- rbind(dt_singles, dt_join) %>% setorder(group_from, feature, group_to)
dt_bind[, ':='(
group_from_f = paste(group_from, feature, sep = '.'),
group_to_f = paste(group_to, feature, sep = '.'))]
dt_bind[, feature := NULL] # feature can be removed
# Colour
dt_bind[, colour := ifelse(group_from_f == group_to_f, "#FFFFFF00", '#00000050')] # Change first to #FF0000FF to show red blobs
# Prep. sectors -----------------------------------------------------------
sectors_f <- union(dt_bind[, group_from_f], dt_bind[, group_to_f]) %>% sort()
colour_lookup <-
union(dt_bind[, group_from], dt_bind[, group_to]) %>% sort() %>%
structure(seq_along(.) + 1, names = .)
sector_colours <- str_replace_all(sectors_f, '.[a-d][0-9]+', '') %>%
colour_lookup[.]
# Gaps between sectors ----------------------------------------------------
gap_sizes <- c(0.0, 1.0)
gap_degree <-
sapply(table(names(sector_colours)), function(i) c(rep(gap_sizes[1], i-1), gap_sizes[2])) %>%
unlist() %>% unname()
# gap_degree <- rep(0, length(sectors_f)) # Or no gap
# Plot! -------------------------------------------------------------------
# Each "sector" is a separate patient/cell/feature combination
circos.par(gap.degree = gap_degree)
circos.initialize(sectors_f, xlim = c(0, 1))
circos.trackPlotRegion(ylim = c(0, 1), track.height = 0.05, bg.col = sector_colours, bg.border = NA)
for(i in 1:nrow(dt_bind)) {
row_i <- dt_bind[i, ]
circos.link(
row_i[['group_from_f']], c(0, 1),
row_i[['group_to_f']], c(0, 1),
border = NA, col = row_i[['colour']]
)
}
# "Feature" labels
circos.trackPlotRegion(track.index = 2, ylim = c(0, 1), panel.fun = function(x, y) {
sector.index = get.cell.meta.data("sector.index")
circos.text(0.5, 0.25, sector.index, col = "white", cex = 0.6, facing = "clockwise", niceFacing = TRUE)
}, bg.border = NA)
# "Patient/cell" labels
for(s in names(colour_lookup)) {
sectors <- sectors_f %>% { .[str_detect(., s)] }
highlight.sector(
sector.index = sectors, track.index = 1, col = colour_lookup[s],
text = s, text.vjust = -1, niceFacing = TRUE)
}
circos.clear()
# counts of unique and shared features ------------------------------------
xlims <- dt_split[, .N, by = group_from][, .(x_from = 0, x_to = N)] %>% as.matrix()
links <- dt_join[, .N, by = .(group_from, group_to)]
colours <- dt_split[, unique(group_from)] %>% structure(seq_along(.) + 1, names = .)
library(circlize)
sectors = names(colours)
circos.par(cell.padding = c(0, 0, 0, 0))
circos.initialize(sectors, xlim = xlims)
circos.trackPlotRegion(ylim = c(0, 1), track.height = 0.05, bg.col = colours, bg.border = NA)
for(i in 1:nrow(links)) {
link <- links[i, ]
circos.link(link[[1]], c(0, link[[3]]), link[[2]], c(0, link[[3]]), col = '#00000025', border = NA)
}
# "Patient/cell" labels
for(s in sectors) {
highlight.sector(
sector.index = s, track.index = 1, col = colours[s],
text = s, text.vjust = -1, niceFacing = TRUE)
}
circos.clear()
编辑:只需从删除的评论中添加 link:请参阅 this answer 以获取标签的一个很好的示例!
@m-dz 提供了正确的方向。我可以提供有关您的模拟数据的更多详细信息。
让我们从这里开始:
patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
as.data.frame
将 dat
转换为三列数据框(即邻接表,其中 link 从第一列开始指向第二列)
dat = as.data.frame(dat, stringsAsFactors = FALSE)
为 patients/cells 和特征生成颜色。
features = unique(dat[[2]])
features_col = structure(rand_color(length(features)), names = features)
patients_col = structure(2:5, names = unique(dat[[1]]))
如果一个特征只存在于一个 patient/cell 组合中,您不希望
展示它但仍然想保持它在情节中的位置,你可以设置
#FFFFFF00
作为它的颜色(一种完全透明的白色,因此它会
不涵盖其他 links)。这里我们希望 link 颜色与特征扇区相同。
col = ifelse(dat[[3]], features_col[dat[[2]]], "#FFFFFF00")
col = gsub("FF$", "80", col) # half transparent
features_count = tapply(dat[[3]], dat[[2]], sum)
# set color to white if it only exists in one patient/cell
col[features_count[dat[[2]]] == 1] = "#FFFFFF00"
以及最后的和弦图:
chordDiagram(dat, col = col, grid.col = c(features_col, patients_col))
您可以看到在特征扇区中至少有两个 link 指向
patients/cells.
准备好数据
library(circlize)
patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
dat<-as.data.frame(dat,stringsAsFactors = FALSE)
获取患者和细胞类型的所有组合
df=NULL
for(i in levels(as.factor(dat$feature))){
temp<-as.data.frame(matrix(combn(dat[which(dat$feature==i),1],2),byrow = TRUE,ncol=2),stringsAsFactors = FALSE)
temp$feature=i
temp$Freq=1
Freq_0<-subset(dat$Var1,dat$feature==i & dat$Freq==0)
for(j in Freq_0){
temp$Freq[temp$V1==j | temp$V2==j]=0
}
df<-rbind(df,temp)
}
添加颜色
df$color=rainbow(dim(df)[1])
df[which(df$Freq==0),5]="white"
df$Freq=1
chordDiagram(df[,c(-3,-5)], transparency = 0.5,col = df$color)
不同 link 表示不同的特征,link 颜色为白色,其中 'Freq' 为 0
我把颜色'white'改成了'black',黑色更显眼
如果想留下'feature'属性……
先准备好数据
library(circlize)
patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
dat<-as.data.frame(dat,stringsAsFactors = FALSE)
df=NULL
for(i in levels(as.factor(dat$feature))){
temp<-as.data.frame(matrix(combn(dat[which(dat$feature==i),1],2),byrow = TRUE,ncol=2),stringsAsFactors = FALSE)
temp$feature=i
temp$Freq=1
Freq_0<-subset(dat$Var1,dat$feature==i & dat$Freq==0)
for(j in Freq_0){
temp$Freq[temp$V1==j | temp$V2==j]=0
}
df<-rbind(df,temp)
}
已处理
library(dplyr)
df1<-subset(df,df$Freq==1)
df0<-subset(df,df$Freq==0)
df1_mod<-summarise(group_by(df1,V1,V2),Freq=n())
df0_mod<-summarise(group_by(df0,V1,V2),Freq=n())
添加颜色
df1_mod$color<-rainbow(5)
df0_mod$color<-"white"
df_res<-rbind(df0_mod,df1_mod)
画出来
chordDiagram(df_res, transparency = 0.5,col = df_res$color)
这些图片显示 'Freq' 中有很多零。
我有一个数据框,其中包含 4 组患者和细胞类型之间的共同特征。我有很多不同的功能,但共享的功能(出现在不止一组中)只是少数。
我想制作一个 circos 图来反映患者组和细胞类型之间共享特征之间的少数联系,同时给出每个组中有多少非共享特征的想法。
我的想法是,它应该是一个包含 4 个扇区(每组患者和细胞类型一个)的图,它们之间有一些连接。每个扇区大小应该反映组中特征的总数,并且这个区域的大部分应该不连接到其他组,而是空的。
这是我目前所拥有的,但我不想要专用于每个功能的扇区,只想要每组患者和细胞类型。
MWE:
library(circlize)
patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
dat
chordDiagram(as.data.frame(dat), transparency = 0.5)
编辑!!
@m-dz 在他的回答中显示的实际上是我正在寻找的格式,4 种不同的 patient/cell.type 组合的 4 个扇区,仅显示连接,而非连接功能,虽然未显示,但应占扇区的大小。
但是,我意识到我的场景比上面 MWE 中的场景更复杂。
一个特征被认为出现在2个patient/cell.type组中,不仅当它在2组中相同,而且当它是similar...(高于阈值的序列同一性)。这样,我有冗余...
patient1-cell1 中的特征 A 可以连接到 patient2-cell1 中的特征 A,也可以连接到特征 B...对于 patient1-cell1,特征 A 应该只计算一次(唯一计数),并扩展为 2 patient2-cell1 中的不同特征。
请参阅下面的示例,以更准确地了解我的实际数据,看看是否可以使用此示例获得最终的 circos 图!谢谢!!
##MWE
#NON OVERLAPPING SETS!
#1: non-shared features
nonshared <- data.frame(patient=c(rep("pat1",20), rep("pat2",10)), cell.type=c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4)), feature=paste("a",1:30,sep=''))
nonshared
#2: features shared between cell types within same patient
sharedcells <- data.frame(patient=c(rep("pat1",3), rep("pat2",4)), cell.types=c(rep("cell1||cell2",3),rep("cell1||cell2",4)), features=c("b1||b1","b1||b1","b1||b1","b2||b2","b3||b3","b4||b4","b4||b5"))
sharedcells
#3: features shared between patients within same cell types
sharedpats <- data.frame(patients=c(rep("pat1||pat2",2), rep("pat1||pat2",6)), cell.type=c(rep("cell1",2),rep("cell2",6)), features=c("c1||c1","c2||c1","c3||c3","c3||c4","c3||c5","c6||c5","c7||c7","c8||c8"))
sharedpats
#4: features shared between patients and cell types
#4.1: shared across pat1-cell1, pat1-cell2, pat2-cell1, pat2-cell2
sharedall1 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1||pat2-cell2",4)), features=c("d1||d1||d1||d1","d2||d2||d2||d3","d4||d4||d3||d3","d5||d5||d5||d5"))
#4.2: shared across pat1-cell1, pat1-cell2, pat2-cell1
sharedall2 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1",2)), features=c("d6||d6||d6","d7||d7||d7"))
#4.3: shared across pat1-cell1, pat1-cell2, pat2-cell2
sharedall3 <- data.frame(both="pat1-cell1||pat1-cell2||pat2-cell2", features="d8||d8||d9")
#4.4: shared across pat1-cell1, pat2-cell1, pat2-cell2
sharedall4 <- data.frame(both="pat1-cell1||pat2-cell1||pat2-cell2", features="d10||d10||d9")
#4.5: shared across pat1-cell2, pat2-cell1, pat2-cell2
sharedall5 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1||pat2-cell2",3)), features=c("d11||d11||d11","d12||d13||d13","d12||d14||d14"))
#4.6: shared across pat1-cell1, pat2-cell2
sharedall6 <- data.frame()
#4.7: shared across pat1-cell2, pat2-cell1
sharedall7 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1",2)), features=c("d15||d16","d17||d17"))
sharedall <- rbind(sharedall1, sharedall2, sharedall3, sharedall4, sharedall5, sharedall6, sharedall7)
sharedall
#you see there might be overlaps between the different subsets of sharedall, but not between sharedall, sharedparts, sharedcells, and nonshared
#I NEED A CIRCOS PLOT THAT SHOWS ALL THE CONNECTIONS. THE NON-CONNECTED (nonshared) FEATURES SHOULD NOT BE SHOWN, BUT THE SHOULD COUNT TO THE SIZE OF THE SECTOR (CORRESPONDING TO A PATIENT-CELL COMBINATION)
#THE FEATURES SHOULD BE COUNT UNIQUELY, SO IF THERE ARE ENTRIES LIKE:
#3 pat1||pat2 cell2 c3||c3
#4 pat1||pat2 cell2 c3||c4
#5 pat1||pat2 cell2 c3||c5
#THE FEATURE c3 SHOULD BE COUNT ONCE FOR pat1, AND EXPAND TO 3 DIFFERENT FEATURES IN pat2
关于预期结果的附注:目的是创建一个简单显示共享特征数量的图,忽略单个特征(下面的第一个图)或共享特征重叠(例如,在第二个图上看起来是一样的)特征在所有组之间共享,从第一个图来看并非如此,但这里重要的是组之间共享特征的比例)。
下面的代码生成以下两个图(图 1 留作参考):
所有个人特征
独特和共有特征的简单计数
其中一个应该符合预期。
# Prep. data --------------------------------------------------------------
nonshared <- data.frame(patient=c(rep("pat1",20), rep("pat2",10)), cell.type=c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4)), feature=paste("a",1:30,sep=''))
sharedcells <- data.frame(patient=c(rep("pat1",3), rep("pat2",4)), cell.types=c(rep("cell1||cell2",3),rep("cell1||cell2",4)), features=c("b1||b1","b1||b1","b1||b1","b2||b2","b3||b3","b4||b4","b4||b5"))
sharedpats <- data.frame(patients=c(rep("pat1||pat2",2), rep("pat1||pat2",6)), cell.type=c(rep("cell1",2),rep("cell2",6)), features=c("c1||c1","c2||c1","c3||c3","c3||c4","c3||c5","c6||c5","c7||c7","c8||c8"))
sharedall1 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1||pat2-cell2",4)), features=c("d1||d1||d1||d1","d2||d2||d2||d3","d4||d4||d3||d3","d5||d5||d5||d5"))
sharedall2 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1",2)), features=c("d6||d6||d6","d7||d7||d7"))
sharedall3 <- data.frame(both="pat1-cell1||pat1-cell2||pat2-cell2", features="d8||d8||d9")
sharedall4 <- data.frame(both="pat1-cell1||pat2-cell1||pat2-cell2", features="d10||d10||d9")
sharedall5 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1||pat2-cell2",3)), features=c("d11||d11||d11","d12||d13||d13","d12||d14||d14"))
sharedall6 <- data.frame()
sharedall7 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1",2)), features=c("d15||d16","d17||d17"))
sharedall <- rbind(sharedall1, sharedall2, sharedall3, sharedall4, sharedall5, sharedall6, sharedall7)
#I NEED A CIRCOS PLOT THAT SHOWS ALL THE CONNECTIONS. THE NON-CONNECTED (nonshared) FEATURES SHOULD NOT BE SHOWN, BUT THE SHOULD COUNT TO THE SIZE OF THE SECTOR (CORRESPONDING TO A PATIENT-CELL COMBINATION)
#THE FEATURES SHOULD BE COUNT UNIQUELY, SO IF THERE ARE ENTRIES LIKE:
#3 pat1||pat2 cell2 c3||c3
#4 pat1||pat2 cell2 c3||c4
#5 pat1||pat2 cell2 c3||c5
#THE FEATURE c3 SHOULD BE COUNT ONCE FOR pat1, AND EXPAND TO 3 DIFFERENT FEATURES IN pat2
# Start -------------------------------------------------------------------
library(circlize)
library(data.table)
library(magrittr)
library(stringr)
library(RColorBrewer)
# Split and pad with 0 ----------------------------------------------------
fun <- function(x) unlist(tstrsplit(x, split = '||', fixed = TRUE))
nonshared %>% setDT()
sharedcells %>% setDT()
sharedpats %>% setDT()
sharedall %>% setDT()
nonshared <- nonshared[, .(group = paste(patient, cell.type, sep = '-'), feature)][, feature := paste0('a', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
sharedcells <- sharedcells[, lapply(.SD, fun), by = 1:nrow(sharedcells)][, .(group = paste(patient, cell.types, sep = '-'), feature = features)][, feature := paste0('b', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
sharedpats <- sharedpats[, lapply(.SD, fun), by = 1:nrow(sharedpats)][, .(group = paste(patients, cell.type, sep = '-'), feature = features)][, feature := paste0('c', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
sharedall <- sharedall[, lapply(.SD, fun), by = 1:nrow(sharedall)][, .(group = both, feature = features)][, feature := paste0('d', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
dt_split <- rbindlist(
list(
nonshared,
sharedcells,
sharedpats,
sharedall
)
)
# Set key and self join to find shared features ---------------------------
setkey(dt_split, feature)
dt_join <- dt_split[dt_split, .(group, i.group, feature), allow.cartesian = TRUE] %>%
.[group != i.group, ]
# Create a "sorted key" ---------------------------------------------------
# key := paste(sort(.SD)...
# To leave only unique combinations of groups and features
dt_join <-
dt_join[,
key := paste(sort(.SD), collapse = '|'),
by = 1:nrow(dt_join),
.SDcols = c('group', 'i.group')
] %>%
setorder(feature, key) %>%
unique(by = c('key', 'feature')) %>%
.[, .(
group_from = i.group,
group_to = group,
feature = feature)]
# Rename and key ----------------------------------------------------------
dt_split %>% setnames(old = 'group', new = 'group_from') %>% setkey(group_from, feature)
dt_join %>% setkey(group_from, feature)
# Individual features -----------------------------------------------------
# Features without connections --------------------------------------------
dt_singles <- dt_split[, .(group_from, group_to = group_from, feature)] %>%
.[, N := .N, by = feature] %>%
.[!(N > 1 & group_from == group_to), !c('N')]
# Bind all, add some columns etc. -----------------------------------------
dt_bind <- rbind(dt_singles, dt_join) %>% setorder(group_from, feature, group_to)
dt_bind[, ':='(
group_from_f = paste(group_from, feature, sep = '.'),
group_to_f = paste(group_to, feature, sep = '.'))]
dt_bind[, feature := NULL] # feature can be removed
# Colour
dt_bind[, colour := ifelse(group_from_f == group_to_f, "#FFFFFF00", '#00000050')] # Change first to #FF0000FF to show red blobs
# Prep. sectors -----------------------------------------------------------
sectors_f <- union(dt_bind[, group_from_f], dt_bind[, group_to_f]) %>% sort()
colour_lookup <-
union(dt_bind[, group_from], dt_bind[, group_to]) %>% sort() %>%
structure(seq_along(.) + 1, names = .)
sector_colours <- str_replace_all(sectors_f, '.[a-d][0-9]+', '') %>%
colour_lookup[.]
# Gaps between sectors ----------------------------------------------------
gap_sizes <- c(0.0, 1.0)
gap_degree <-
sapply(table(names(sector_colours)), function(i) c(rep(gap_sizes[1], i-1), gap_sizes[2])) %>%
unlist() %>% unname()
# gap_degree <- rep(0, length(sectors_f)) # Or no gap
# Plot! -------------------------------------------------------------------
# Each "sector" is a separate patient/cell/feature combination
circos.par(gap.degree = gap_degree)
circos.initialize(sectors_f, xlim = c(0, 1))
circos.trackPlotRegion(ylim = c(0, 1), track.height = 0.05, bg.col = sector_colours, bg.border = NA)
for(i in 1:nrow(dt_bind)) {
row_i <- dt_bind[i, ]
circos.link(
row_i[['group_from_f']], c(0, 1),
row_i[['group_to_f']], c(0, 1),
border = NA, col = row_i[['colour']]
)
}
# "Feature" labels
circos.trackPlotRegion(track.index = 2, ylim = c(0, 1), panel.fun = function(x, y) {
sector.index = get.cell.meta.data("sector.index")
circos.text(0.5, 0.25, sector.index, col = "white", cex = 0.6, facing = "clockwise", niceFacing = TRUE)
}, bg.border = NA)
# "Patient/cell" labels
for(s in names(colour_lookup)) {
sectors <- sectors_f %>% { .[str_detect(., s)] }
highlight.sector(
sector.index = sectors, track.index = 1, col = colour_lookup[s],
text = s, text.vjust = -1, niceFacing = TRUE)
}
circos.clear()
# counts of unique and shared features ------------------------------------
xlims <- dt_split[, .N, by = group_from][, .(x_from = 0, x_to = N)] %>% as.matrix()
links <- dt_join[, .N, by = .(group_from, group_to)]
colours <- dt_split[, unique(group_from)] %>% structure(seq_along(.) + 1, names = .)
library(circlize)
sectors = names(colours)
circos.par(cell.padding = c(0, 0, 0, 0))
circos.initialize(sectors, xlim = xlims)
circos.trackPlotRegion(ylim = c(0, 1), track.height = 0.05, bg.col = colours, bg.border = NA)
for(i in 1:nrow(links)) {
link <- links[i, ]
circos.link(link[[1]], c(0, link[[3]]), link[[2]], c(0, link[[3]]), col = '#00000025', border = NA)
}
# "Patient/cell" labels
for(s in sectors) {
highlight.sector(
sector.index = s, track.index = 1, col = colours[s],
text = s, text.vjust = -1, niceFacing = TRUE)
}
circos.clear()
编辑:只需从删除的评论中添加 link:请参阅 this answer 以获取标签的一个很好的示例!
@m-dz 提供了正确的方向。我可以提供有关您的模拟数据的更多详细信息。
让我们从这里开始:
patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
as.data.frame
将 dat
转换为三列数据框(即邻接表,其中 link 从第一列开始指向第二列)
dat = as.data.frame(dat, stringsAsFactors = FALSE)
为 patients/cells 和特征生成颜色。
features = unique(dat[[2]])
features_col = structure(rand_color(length(features)), names = features)
patients_col = structure(2:5, names = unique(dat[[1]]))
如果一个特征只存在于一个 patient/cell 组合中,您不希望
展示它但仍然想保持它在情节中的位置,你可以设置
#FFFFFF00
作为它的颜色(一种完全透明的白色,因此它会
不涵盖其他 links)。这里我们希望 link 颜色与特征扇区相同。
col = ifelse(dat[[3]], features_col[dat[[2]]], "#FFFFFF00")
col = gsub("FF$", "80", col) # half transparent
features_count = tapply(dat[[3]], dat[[2]], sum)
# set color to white if it only exists in one patient/cell
col[features_count[dat[[2]]] == 1] = "#FFFFFF00"
以及最后的和弦图:
chordDiagram(dat, col = col, grid.col = c(features_col, patients_col))
您可以看到在特征扇区中至少有两个 link 指向 patients/cells.
准备好数据
library(circlize)
patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
dat<-as.data.frame(dat,stringsAsFactors = FALSE)
获取患者和细胞类型的所有组合
df=NULL
for(i in levels(as.factor(dat$feature))){
temp<-as.data.frame(matrix(combn(dat[which(dat$feature==i),1],2),byrow = TRUE,ncol=2),stringsAsFactors = FALSE)
temp$feature=i
temp$Freq=1
Freq_0<-subset(dat$Var1,dat$feature==i & dat$Freq==0)
for(j in Freq_0){
temp$Freq[temp$V1==j | temp$V2==j]=0
}
df<-rbind(df,temp)
}
添加颜色
df$color=rainbow(dim(df)[1])
df[which(df$Freq==0),5]="white"
df$Freq=1
chordDiagram(df[,c(-3,-5)], transparency = 0.5,col = df$color)
不同 link 表示不同的特征,link 颜色为白色,其中 'Freq' 为 0
我把颜色'white'改成了'black',黑色更显眼
如果想留下'feature'属性…… 先准备好数据
library(circlize)
patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
dat<-as.data.frame(dat,stringsAsFactors = FALSE)
df=NULL
for(i in levels(as.factor(dat$feature))){
temp<-as.data.frame(matrix(combn(dat[which(dat$feature==i),1],2),byrow = TRUE,ncol=2),stringsAsFactors = FALSE)
temp$feature=i
temp$Freq=1
Freq_0<-subset(dat$Var1,dat$feature==i & dat$Freq==0)
for(j in Freq_0){
temp$Freq[temp$V1==j | temp$V2==j]=0
}
df<-rbind(df,temp)
}
已处理
library(dplyr)
df1<-subset(df,df$Freq==1)
df0<-subset(df,df$Freq==0)
df1_mod<-summarise(group_by(df1,V1,V2),Freq=n())
df0_mod<-summarise(group_by(df0,V1,V2),Freq=n())
添加颜色
df1_mod$color<-rainbow(5)
df0_mod$color<-"white"
df_res<-rbind(df0_mod,df1_mod)
画出来
chordDiagram(df_res, transparency = 0.5,col = df_res$color)
这些图片显示 'Freq' 中有很多零。