Pubmed 将文章详细信息提取到数据框

Pubmed fetch article details to a daframe

这是代码。

import pandas as pd
from pymed import PubMed
import numpy as np
pubmed = PubMed(tool="PubMedSearcher", email="myemail@ccc.com")


## PUT YOUR SEARCH TERM HERE ##
search_term = 'Charlie Brown'
results = pubmed.query(search_term, max_results=100000)
articleList = []
articleInfo = []

for article in results:
# Print the type of object we've found (can be either PubMedBookArticle or PubMedArticle).
# We need to convert it to dictionary with available function
    articleDict = article.toDict()
    articleList.append(articleDict)

# Generate list of dict records which will hold all article details that could be fetch from PUBMED API
for article in articleList:
#Sometimes article['pubmed_id'] contains list separated with comma - take first pubmedId in that list - thats article pubmedId
    pubmedId = article['pubmed_id'].partition('\n')[0]
    # Append article info to dictionary 
    articleInfo.append({u'pubmed_id':pubmedId,
                       u'publication_date':article['publication_date'], 
                       u'authors':article['authors']})

df=pd.json_normalize(articleInfo)

运行 此代码将获取三列,pubmed_id、publication_date 和 authors.

有没有办法取消作者栏的嵌套并保留其他两栏?非常感谢。

如果你想取消嵌套,你必须定义一些策略。例如,您可以使用 lastname, firstname 加入作者,将每个作者拆分为 ;:

# New column to easily identify how many authors there are in the paper
df['n_authors'] = df['authors'].map(len)

# Unnest authors into a single string using the above-mentioned strategy
df['authors'] = df['authors'].map(lambda authors: ';'.join([f"{author['lastname']}, {author['firstname']}" for author in authors]))

输出:

   pubmed_id publication_date                                            authors  n_authors  
0   35435469       2022-04-19  Easwaran, Raju;Khan, Moin;Sancheti, Parag;Shya...         41  
1   34480858       2021-09-05  Flaxman, Amy;Marchevsky, Natalie G;Jenkin, Dan...         38  
2   30857579       2019-03-13                                     Brown, Charlie          1  
3   28640023       2017-06-24  Thornton, Kevin C;Schwarz, Jennifer J;Gross, A...         12  
4   24195874       2013-11-08  Bicket, Mark C;Gupta, Anita;Brown, Charlie H;C...          4  
5   21741796       2011-07-12  Bird, Jonathan H;Carmont, Michael R;Dhillon, M...          7  
6   21324873       2011-02-18  Cohen, Steven P;Brown, Charlie;Kurihara, Conni...          6  
7   20228712       2010-03-17  Cohen, Steven P;Kapoor, Shruti G;Nguyen, Cuong...          8  
8   20109957       2010-01-30  Cohen, Steven P;Brown, Charlie;Kurihara, Conni...          6  
9   18248779       2008-02-06  Whitaker, Iain S;Duggan, Eileen M;Alloway, Rit...         10  
10  16917639       2006-08-19  Drayton, William;Brown, Charlie;Hillhouse, Karin          3  
11  16282488       2005-11-12  Mao, Hanwen;Lafont, Bernard A P;Igarashi, Tats...          9  
12  14581571       2003-10-29  Moniuszko, Marcin;Brown, Charlie;Pal, Ranajit;...          7  
13  12163382       2002-08-07  Williams, Kenneth;Schwartz, Annette;Corey, Sar...         10