重塑数据以绘制网络
Re-shaping data for plotting a network
我有一个数据集,我想重新调整形状以绘制为网络 (following the work done here)。初始数据框如下所示:
authors <- c('Author A', 'Author B', 'Author C',
'Author A', 'Author D', 'Author C')
affiliation <- c('University 1', 'University 2', 'University 1',
'University 1', 'Institute 3', 'University 1')
manuscript <- c('Manuscript A', 'Manuscript A', 'Manuscript A',
'Manuscript B', 'Manuscript B', 'Manuscript B')
df <- data.frame(authors, affiliation, manuscript)
我想重塑这个,这样对于每篇手稿,我都可以获得作者与主要作者从属关系的所有组合(我希望我问这个问题的方式是有道理的)。这将导致以下数据框:
df_network <- data.frame('primary_author'= c('Author A', 'Author A',
'Author B', 'Author B',
'Author C', 'Author C',
'Author A','Author A',
'Author D', 'Author D',
'Author C', 'Author C'),
'connection'= c('Author B', 'Author C',
'Author A', 'Author C',
'Author A', 'Author B',
'Author D', 'Author C',
'Author A', 'Author C',
'Author A', 'Author D'),
'primary_affiliation' = c('University 1', 'University 1',
'University 2', 'University 2',
'University 1', 'University 1',
'University 1', 'University 1',
'Institute 3', 'Institute 3',
'University 1', 'University 1'),
'manuscript' = c('Manuscript A', 'Manuscript A',
'Manuscript A', 'Manuscript A',
'Manuscript A', 'Manuscript A',
'Manuscript B', 'Manuscript B',
'Manuscript B', 'Manuscript B',
'Manuscript B', 'Manuscript B'))
当然,我可以手动重新塑造数据,但这非常繁琐,尤其是当列表变得很长时。我以前(手动)做过这个,如果我能得到 df_network
形状的数据,那么结果就很好了。任何人都可以提供的任何提示或技巧将不胜感激。
试试这个:
library(dplyr)
df %>%
left_join(df, by = "manuscript") %>%
filter(!authors.x == authors.y) %>%
select(primary_author = authors.x,
connection = authors.y,
primary_affiliation = affiliation.x,
manuscript)
输出:
primary_author connection primary_affiliation manuscript
1 Author A Author B University 1 Manuscript A
2 Author A Author C University 1 Manuscript A
3 Author B Author A University 2 Manuscript A
4 Author B Author C University 2 Manuscript A
5 Author C Author A University 1 Manuscript A
6 Author C Author B University 1 Manuscript A
7 Author A Author D University 1 Manuscript B
8 Author A Author C University 1 Manuscript B
9 Author D Author A Institute 3 Manuscript B
10 Author D Author C Institute 3 Manuscript B
11 Author C Author A University 1 Manuscript B
12 Author C Author D University 1 Manuscript B
您也可以使用 data.table
完成此任务:
library('data.table')
df <- data.table(authors, affiliation, manuscript)
df <- merge(
df,
df,
by = 'manuscript', allow.cartesian = TRUE)[authors.x != authors.y,
.(primary_author = authors.x,
connection = authors.y,
primary_affiliation = affiliation.x,
manuscript)]
我有一个数据集,我想重新调整形状以绘制为网络 (following the work done here)。初始数据框如下所示:
authors <- c('Author A', 'Author B', 'Author C',
'Author A', 'Author D', 'Author C')
affiliation <- c('University 1', 'University 2', 'University 1',
'University 1', 'Institute 3', 'University 1')
manuscript <- c('Manuscript A', 'Manuscript A', 'Manuscript A',
'Manuscript B', 'Manuscript B', 'Manuscript B')
df <- data.frame(authors, affiliation, manuscript)
我想重塑这个,这样对于每篇手稿,我都可以获得作者与主要作者从属关系的所有组合(我希望我问这个问题的方式是有道理的)。这将导致以下数据框:
df_network <- data.frame('primary_author'= c('Author A', 'Author A',
'Author B', 'Author B',
'Author C', 'Author C',
'Author A','Author A',
'Author D', 'Author D',
'Author C', 'Author C'),
'connection'= c('Author B', 'Author C',
'Author A', 'Author C',
'Author A', 'Author B',
'Author D', 'Author C',
'Author A', 'Author C',
'Author A', 'Author D'),
'primary_affiliation' = c('University 1', 'University 1',
'University 2', 'University 2',
'University 1', 'University 1',
'University 1', 'University 1',
'Institute 3', 'Institute 3',
'University 1', 'University 1'),
'manuscript' = c('Manuscript A', 'Manuscript A',
'Manuscript A', 'Manuscript A',
'Manuscript A', 'Manuscript A',
'Manuscript B', 'Manuscript B',
'Manuscript B', 'Manuscript B',
'Manuscript B', 'Manuscript B'))
当然,我可以手动重新塑造数据,但这非常繁琐,尤其是当列表变得很长时。我以前(手动)做过这个,如果我能得到 df_network
形状的数据,那么结果就很好了。任何人都可以提供的任何提示或技巧将不胜感激。
试试这个:
library(dplyr)
df %>%
left_join(df, by = "manuscript") %>%
filter(!authors.x == authors.y) %>%
select(primary_author = authors.x,
connection = authors.y,
primary_affiliation = affiliation.x,
manuscript)
输出:
primary_author connection primary_affiliation manuscript
1 Author A Author B University 1 Manuscript A
2 Author A Author C University 1 Manuscript A
3 Author B Author A University 2 Manuscript A
4 Author B Author C University 2 Manuscript A
5 Author C Author A University 1 Manuscript A
6 Author C Author B University 1 Manuscript A
7 Author A Author D University 1 Manuscript B
8 Author A Author C University 1 Manuscript B
9 Author D Author A Institute 3 Manuscript B
10 Author D Author C Institute 3 Manuscript B
11 Author C Author A University 1 Manuscript B
12 Author C Author D University 1 Manuscript B
您也可以使用 data.table
完成此任务:
library('data.table')
df <- data.table(authors, affiliation, manuscript)
df <- merge(
df,
df,
by = 'manuscript', allow.cartesian = TRUE)[authors.x != authors.y,
.(primary_author = authors.x,
connection = authors.y,
primary_affiliation = affiliation.x,
manuscript)]