Web 抓取 python 中的多个 google 学者页面
Web scraping multiple google scholar pages in python
我想抓取多个 Google 学者用户个人资料 - 出版物、期刊、引文等。我已经编写了 python 代码来抓取给定 url 的用户个人资料。现在,假设我在这样的 excel 文件中有 100 个名称和相应的 url。
name link
Autor https://scholar.google.com/citations?user=cp-8uaAAAAAJ&hl=en
Dorn https://scholar.google.com/citations?user=w3Dri00AAAAJ&hl=en
Hanson https://scholar.google.com/citations?user=nMtHiQsAAAAJ&hl=en
Borjas https://scholar.google.com/citations?user=Patm-BEAAAAJ&hl=en
....
我的问题是我能否读取此文件的 'link' 列并为 url 编写一个 for 循环,以便我可以抓取这些配置文件中的每一个并将结果附加到相同的文件中文件。我似乎有点牵强,但我希望有办法做到这一点。提前致谢!
我希望它对你来说不是太高级
1 为您的页面创建 class
class Pages:
def __init__(self, name=None, link=None):
self.name = name
self.link = link
2 创建 pages
列表
pages = []
3 查找行定位器,例如:
rows = driver.find_elements_by_css_selector("your_selector")
行数必须与您 table 中的行数相同。例如,您必须在列表中添加项目,rows
数字将为 20。
4 获取每行值:
for row in rows:
name = row.find_element_by_css_selector("here is a unique selector for each data field for name").text
link = row.find_element_by_css_selector("here is a unique selector for each data field for link").text
5 创建页面对象:
page = Page(name=name,link=link)
6 将所有行放入列表:
pages.append(page)
结果
一个页面列表(对象 page
),其中第一行可以使用 pages[0]
访问,第二行可以使用 pages[1]
等等。
P.S
如果您在使用选择器时遇到问题,请将它们视为不同的问题。
我想我已经向你解释了这个概念,所以你将能够开始。
您可以使用 pandas.read_csv()
从 csv 中读取特定文件。例如:
import pandas as pd
df = pd.read_csv('data.csv')
arr = []
link_col = df['link']
for i in link_col:
arr.append(i);
print(arr)
这将允许您仅提取 link 列并将每个值附加到您的数组中。如果您想了解更多信息,可以参考pandas。
要从 Excel 文件中读取数据,您可以使用 pandas
read_excel()
方法,如下所示:
# https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html
authors_df = pd.read_excel("google_scholar_scrape_multiple_authors.xlsx", sheet_name="authors") # sheet_name is optional in this case
print(authors_df["author_link"])
'''
0 https://scholar.google.com/citations?hl=en&use...
1 https://scholar.google.com/citations?hl=en&use...
2 https://scholar.google.com/citations?hl=en&use...
3 https://scholar.google.com/citations?hl=en&use...
4 https://scholar.google.com/citations?hl=en&use...
Name: author_link, dtype: object
'''
print(authors_df)
'''
author_name author_link
0 Masatoshi Nei https://scholar.google.com/citations?hl=en&use...
1 Ulrich K. Laemmli https://scholar.google.com/citations?hl=en&use...
2 Gene Myers https://scholar.google.com/citations?hl=en&use...
3 Sudhir Kumar https://scholar.google.com/citations?hl=en&use...
4 Irving Weissman https://scholar.google.com/citations?hl=en&use...
'''
要从多个作者那里抓取,您可以使用 for
循环迭代 ["author_link"]
并使用 beautifulsoup
, lxml
, and requests
库提取所需数据。
Code and full example in the online IDE:
from bs4 import BeautifulSoup
import requests, lxml
import pandas as pd
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582",
}
# https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html
authors_df = pd.read_excel("google_scholar_scrape_multiple_authors.xlsx", sheet_name="authors") # sheet_name is optional in this case
# to_list() returns a list of author links so we can iterate over them
for author_link in authors_df["author_link"].to_list():
html = requests.get(author_link, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
print(f"Currently extracting: {soup.select_one('#gsc_prf_in').text}")
author_email = soup.select_one("#gsc_prf_ivh").text
author_image = f'https://scholar.google.com{soup.select_one("#gsc_prf_pup-img")["src"]}'
print(author_image, f"Author email: {author_email}", sep="\n")
# iterating over container with all needed data by accessing the right CSS selector
# have a look at SelectorGadget Chrome extension to easily grab CSS selectors
for article in soup.select("#gsc_a_b .gsc_a_t"):
article_title = article.select_one(".gsc_a_at").text
article_link = f'https://scholar.google.com{article.select_one(".gsc_a_at")["href"]}'
article_authors = article.select_one(".gsc_a_at+ .gs_gray").text
article_publication = article.select_one(".gs_gray+ .gs_gray").text
print(article_title, article_link, article_authors, article_publication, sep="\n")
print("-" * 15)
# part of the output:
'''
Currently extracting: Masatoshi Nei
https://scholar.google.comhttps://scholar.googleusercontent.com/citations?view_op=view_photo&user=VxOmZDgAAAAJ&citpid=3
Author email: Verified email at temple.edu
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:u5HHmVD_uO8C
N Saitou, M Nei
Molecular biology and evolution 4 (4), 406-425, 1987
... other results
---------------
Currently extracting: Irving Weissman
https://scholar.google.com/citations/images/avatar_scholar_128.png
Author email: Verified email at stanford.edu
Stem cells, cancer, and cancer stem cells
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=Y66bJgUAAAAJ&citation_for_view=Y66bJgUAAAAJ:u5HHmVD_uO8C
T Reya, SJ Morrison, MF Clarke, IL Weissman
nature 414 (6859), 105-111, 2001
'''
或者,您可以使用 SeprApi 中的 Google Scholar Author API 实现相同的目的。这是付费 API 和免费计划。
本质上,你只需要从接收到的字典中抓取你想要的数据,而不需要弄清楚要使用什么选择器来抓取正确的数据,如何绕过 Google 的块,如何增加请求数量。
要集成的示例代码:
import re
import pandas as pd
from serpapi import GoogleSearch
authors_df = pd.read_excel("google_scholar_scrape_multiple_authors.xlsx", sheet_name="authors") # sheet_name is optional in this case
for author in authors_df["author_link"].to_list():
params = {
"api_key": "YOUR_API_KEY",
"engine": "google_scholar_author",
"hl": "en",
# using basic regular expression to grab user ID from the passed URL
"author_id": re.search(r"user=(.*)", author).group(1) # -> VxOmZDgAAAAJ, unique author ID from the URL
}
search = GoogleSearch(params)
results = search.get_dict()
print(f"Extracting data from: {results['author']['name']}\n"
f"Author info: {results['author']}\n\n"
f"Author articles:\n{results['articles']}\n")
# part of the output:
'''
Extracting data from: Masatoshi Nei
Author info: {'name': 'Masatoshi Nei', 'affiliations': 'Laura Carnell Professor of Biology, Temple University', 'email': 'Verified email at temple.edu', 'interests': [{'title': 'Evolution', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:evolution', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolution'}, {'title': 'Evolutionary biology', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:evolutionary_biology', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolutionary_biology'}, {'title': 'Molecular evolution', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:molecular_evolution', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amolecular_evolution'}, {'title': 'Population genetics', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:population_genetics', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Apopulation_genetics'}, {'title': 'Phylogenetics', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:phylogenetics', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aphylogenetics'}], 'thumbnail': 'https://scholar.googleusercontent.com/citations?view_op=view_photo&user=VxOmZDgAAAAJ&citpid=3'}
Author articles:
[{'title': 'The neighbor-joining method: a new method for reconstructing phylogenetic trees.', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:u5HHmVD_uO8C', 'citation_id': 'VxOmZDgAAAAJ:u5HHmVD_uO8C', 'authors': 'N Saitou, M Nei', 'publication': 'Molecular biology and evolution 4 (4), 406-425, 1987', 'cited_by': {'value': 64841, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=7672721046330422437,346314157833338191', 'serpapi_link': 'https://serpapi.com/search.json?cites=7672721046330422437%2C346314157833338191&engine=google_scholar&hl=en', 'cites_id': '7672721046330422437,346314157833338191'}, 'year': '1987'}, {'title': 'MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:Tyk-4Ss8FVUC', 'citation_id': 'VxOmZDgAAAAJ:Tyk-4Ss8FVUC', 'authors': 'K Tamura, D Peterson, N Peterson, G Stecher, M Nei, S Kumar', 'publication': 'Molecular biology and evolution 28 (10), 2731-2739, 2011', 'cited_by': {'value': 44316, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5624029996178252455,5910675136328950108,13537318717249213469', 'serpapi_link': 'https://serpapi.com/search.json?cites=5624029996178252455%2C5910675136328950108%2C13537318717249213469&engine=google_scholar&hl=en', 'cites_id': '5624029996178252455,5910675136328950108,13537318717249213469'}, 'year': '2011'}, {'title': 'MEGA6: molecular evolutionary genetics analysis version 6.0', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:qmtmRrLr0tkC', 'citation_id': 'VxOmZDgAAAAJ:qmtmRrLr0tkC', 'authors': 'K Tamura, G Stecher, D Peterson, A Filipski, S Kumar', 'publication': 'Molecular biology and evolution 30 (12), 2725-2729, 2013', 'cited_by': {'value': 40558, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5258359186493639031', 'serpapi_link': 'https://serpapi.com/search.json?cites=5258359186493639031&engine=google_scholar&hl=en', 'cites_id': '5258359186493639031'}, 'year': '2013'}, {'title': 'MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:u-x6o8ySG0sC', 'citation_id': 'VxOmZDgAAAAJ:u-x6o8ySG0sC', 'authors': 'K Tamura, J Dudley, M Nei, S Kumar', 'publication': 'Molecular biology and evolution 24 (8), 1596-1599, 2007', 'cited_by': {'value': 34245, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=8480751610153565117', 'serpapi_link': 'https://serpapi.com/search.json?cites=8480751610153565117&engine=google_scholar&hl=en', 'cites_id': '8480751610153565117'}, 'year': '2007'}, {'title': 'Molecular evolutionary genetics', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:d1gkVwhDpl0C', 'citation_id': 'VxOmZDgAAAAJ:d1gkVwhDpl0C', 'authors': 'M Nei', 'publication': 'Columbia university press, 1987', 'cited_by': {'value': 20704, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=7660515423132980153', 'serpapi_link': 'https://serpapi.com/search.json?cites=7660515423132980153&engine=google_scholar&hl=en', 'cites_id': '7660515423132980153'}, 'year': '1987'}, {'title': 'MEGA2: molecular evolutionary genetics analysis software', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:IjCSPb-OGe4C', 'citation_id': 'VxOmZDgAAAAJ:IjCSPb-OGe4C', 'authors': 'S Kumar, K Tamura, IB Jakobsen, M Nei', 'publication': 'Bioinformatics 17 (12), 1244-1245, 2001', 'cited_by': {'value': 16078, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=14171206204658643394,531426008085525562,5869149036159079676,8067244568899724142,12929609819447339488,15783386726452728786', 'serpapi_link': 'https://serpapi.com/search.json?cites=14171206204658643394%2C531426008085525562%2C5869149036159079676%2C8067244568899724142%2C12929609819447339488%2C15783386726452728786&engine=google_scholar&hl=en', 'cites_id': '14171206204658643394,531426008085525562,5869149036159079676,8067244568899724142,12929609819447339488,15783386726452728786'}, 'year': '2001'}, {'title': 'Estimation of average heterozygosity and genetic distance from a small number of individuals', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:2osOgNQ5qMEC', 'citation_id': 'VxOmZDgAAAAJ:2osOgNQ5qMEC', 'authors': 'M Nei', 'publication': 'Genetics 89 (3), 583-590, 1978', 'cited_by': {'value': 14504, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=11038674224870321151', 'serpapi_link': 'https://serpapi.com/search.json?cites=11038674224870321151&engine=google_scholar&hl=en', 'cites_id': '11038674224870321151'}, 'year': '1978'}, {'title': 'MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:9yKSN-GCB0IC', 'citation_id': 'VxOmZDgAAAAJ:9yKSN-GCB0IC', 'authors': 'S Kumar, K Tamura, M Nei', 'publication': 'Briefings in bioinformatics 5 (2), 150-163, 2004', 'cited_by': {'value': 14403, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=10013295782066828040,15148316572039251274', 'serpapi_link': 'https://serpapi.com/search.json?cites=10013295782066828040%2C15148316572039251274&engine=google_scholar&hl=en', 'cites_id': '10013295782066828040,15148316572039251274'}, 'year': '2004'}, {'title': 'Mathematical model for studying genetic variation in terms of restriction endonucleases', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:qjMakFHDy7sC', 'citation_id': 'VxOmZDgAAAAJ:qjMakFHDy7sC', 'authors': 'M Nei, WH Li', 'publication': 'Proceedings of the National Academy of Sciences 76 (10), 5269-5273, 1979', 'cited_by': {'value': 13619, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5179626164554275201,1942230974501958280', 'serpapi_link': 'https://serpapi.com/search.json?cites=5179626164554275201%2C1942230974501958280&engine=google_scholar&hl=en', 'cites_id': '5179626164554275201,1942230974501958280'}, 'year': '1979'}, {'title': 'Genetic distance between populations', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:UeHWp8X0CEIC', 'citation_id': 'VxOmZDgAAAAJ:UeHWp8X0CEIC', 'authors': 'M Nei', 'publication': 'The American Naturalist 106 (949), 283-292, 1972', 'cited_by': {'value': 12980, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=4154924214026252226,7115074001272219295', 'serpapi_link': 'https://serpapi.com/search.json?cites=4154924214026252226%2C7115074001272219295&engine=google_scholar&hl=en', 'cites_id': '4154924214026252226,7115074001272219295'}, 'year': '1972'}, {'title': 'Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:Y0pCki6q_DkC', 'citation_id': 'VxOmZDgAAAAJ:Y0pCki6q_DkC', 'authors': 'K Tamura, M Nei', 'publication': 'Molecular biology and evolution 10 (3), 512-526, 1993', 'cited_by': {'value': 11093, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=13509507708085673250', 'serpapi_link': 'https://serpapi.com/search.json?cites=13509507708085673250&engine=google_scholar&hl=en', 'cites_id': '13509507708085673250'}, 'year': '1993'}, {'title': 'Analysis of gene diversity in subdivided populations', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:zYLM7Y9cAGgC', 'citation_id': 'VxOmZDgAAAAJ:zYLM7Y9cAGgC', 'authors': 'M Nei', 'publication': 'Proceedings of the national academy of sciences 70 (12), 3321-3323, 1973', 'cited_by': {'value': 10714, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=11712109356391350421', 'serpapi_link': 'https://serpapi.com/search.json?cites=11712109356391350421&engine=google_scholar&hl=en', 'cites_id': '11712109356391350421'}, 'year': '1973'}, {'title': 'Molecular evolution and phylogenetics', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:YsMSGLbcyi4C', 'citation_id': 'VxOmZDgAAAAJ:YsMSGLbcyi4C', 'authors': 'M Nei, S Kumar', 'publication': 'Oxford University Press, USA, 2000', 'cited_by': {'value': 8795, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=703217195301701212,1351927951694611906', 'serpapi_link': 'https://serpapi.com/search.json?cites=703217195301701212%2C1351927951694611906&engine=google_scholar&hl=en', 'cites_id': '703217195301701212,1351927951694611906'}, 'year': '2000'}, {'title': 'Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions.', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:W7OEmFMy1HYC', 'citation_id': 'VxOmZDgAAAAJ:W7OEmFMy1HYC', 'authors': 'M Nei, T Gojobori', 'publication': 'Molecular biology and evolution 3 (5), 418-426, 1986', 'cited_by': {'value': 5279, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=12106160511321461626', 'serpapi_link': 'https://serpapi.com/search.json?cites=12106160511321461626&engine=google_scholar&hl=en', 'cites_id': '12106160511321461626'}, 'year': '1986'}, {'title': 'Prospects for inferring very large phylogenies by using the neighbor-joining method', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:0EnyYjriUFMC', 'citation_id': 'VxOmZDgAAAAJ:0EnyYjriUFMC', 'authors': 'K Tamura, M Nei, S Kumar', 'publication': 'Proceedings of the National Academy of Sciences 101 (30), 11030-11035, 2004', 'cited_by': {'value': 4882, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=9650987578903829104', 'serpapi_link': 'https://serpapi.com/search.json?cites=9650987578903829104&engine=google_scholar&hl=en', 'cites_id': '9650987578903829104'}, 'year': '2004'}, {'title': 'The bottleneck effect and genetic variability in populations', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:eQOLeE2rZwMC', 'citation_id': 'VxOmZDgAAAAJ:eQOLeE2rZwMC', 'authors': 'M Nei, T Maruyama, R Chakraborty', 'publication': 'Evolution, 1-10, 1975', 'cited_by': {'value': 3906, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=13149273985544466189', 'serpapi_link': 'https://serpapi.com/search.json?cites=13149273985544466189&engine=google_scholar&hl=en', 'cites_id': '13149273985544466189'}, 'year': '1975'}, {'title': 'Accuracy of estimated phylogenetic trees from molecular data', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:Se3iqnhoufwC', 'citation_id': 'VxOmZDgAAAAJ:Se3iqnhoufwC', 'authors': 'M Nei, F Tajima, Y Tateno', 'publication': 'Journal of molecular evolution 19 (2), 153-170, 1983', 'cited_by': {'value': 2877, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=10638180566709737898', 'serpapi_link': 'https://serpapi.com/search.json?cites=10638180566709737898&engine=google_scholar&hl=en', 'cites_id': '10638180566709737898'}, 'year': '1983'}, {'title': 'Molecular population genetics and evolution.', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:WF5omc3nYNoC', 'citation_id': 'VxOmZDgAAAAJ:WF5omc3nYNoC', 'authors': 'M Nei', 'publication': 'Molecular population genetics and evolution., 1975', 'cited_by': {'value': 2795, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=7886550802885580479', 'serpapi_link': 'https://serpapi.com/search.json?cites=7886550802885580479&engine=google_scholar&hl=en', 'cites_id': '7886550802885580479'}, 'year': '1975'}, {'title': 'Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:roLk4NBRz8UC', 'citation_id': 'VxOmZDgAAAAJ:roLk4NBRz8UC', 'authors': 'AL Hughes, M Nei', 'publication': 'Nature 335 (6186), 167-170, 1988', 'cited_by': {'value': 2169, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=2966744676732650646', 'serpapi_link': 'https://serpapi.com/search.json?cites=2966744676732650646&engine=google_scholar&hl=en', 'cites_id': '2966744676732650646'}, 'year': '1988'}, {'title': 'Sampling variances of heterozygosity and genetic distance', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:UebtZRa9Y70C', 'citation_id': 'VxOmZDgAAAAJ:UebtZRa9Y70C', 'authors': 'M Nei, AK Roychoudhury', 'publication': 'Genetics 76 (2), 379-390, 1974', 'cited_by': {'value': 1918, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5978996318059495400', 'serpapi_link': 'https://serpapi.com/search.json?cites=5978996318059495400&engine=google_scholar&hl=en', 'cites_id': '5978996318059495400'}, 'year': '1974'}]
'''
Disclaimer, I work for SerpApi.
我想抓取多个 Google 学者用户个人资料 - 出版物、期刊、引文等。我已经编写了 python 代码来抓取给定 url 的用户个人资料。现在,假设我在这样的 excel 文件中有 100 个名称和相应的 url。
name link
Autor https://scholar.google.com/citations?user=cp-8uaAAAAAJ&hl=en
Dorn https://scholar.google.com/citations?user=w3Dri00AAAAJ&hl=en
Hanson https://scholar.google.com/citations?user=nMtHiQsAAAAJ&hl=en
Borjas https://scholar.google.com/citations?user=Patm-BEAAAAJ&hl=en
....
我的问题是我能否读取此文件的 'link' 列并为 url 编写一个 for 循环,以便我可以抓取这些配置文件中的每一个并将结果附加到相同的文件中文件。我似乎有点牵强,但我希望有办法做到这一点。提前致谢!
我希望它对你来说不是太高级
1 为您的页面创建 class
class Pages:
def __init__(self, name=None, link=None):
self.name = name
self.link = link
2 创建 pages
列表
pages = []
3 查找行定位器,例如:
rows = driver.find_elements_by_css_selector("your_selector")
行数必须与您 table 中的行数相同。例如,您必须在列表中添加项目,rows
数字将为 20。
4 获取每行值:
for row in rows:
name = row.find_element_by_css_selector("here is a unique selector for each data field for name").text
link = row.find_element_by_css_selector("here is a unique selector for each data field for link").text
5 创建页面对象:
page = Page(name=name,link=link)
6 将所有行放入列表:
pages.append(page)
结果
一个页面列表(对象 page
),其中第一行可以使用 pages[0]
访问,第二行可以使用 pages[1]
等等。
P.S 如果您在使用选择器时遇到问题,请将它们视为不同的问题。 我想我已经向你解释了这个概念,所以你将能够开始。
您可以使用 pandas.read_csv()
从 csv 中读取特定文件。例如:
import pandas as pd
df = pd.read_csv('data.csv')
arr = []
link_col = df['link']
for i in link_col:
arr.append(i);
print(arr)
这将允许您仅提取 link 列并将每个值附加到您的数组中。如果您想了解更多信息,可以参考pandas。
要从 Excel 文件中读取数据,您可以使用 pandas
read_excel()
方法,如下所示:
# https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html
authors_df = pd.read_excel("google_scholar_scrape_multiple_authors.xlsx", sheet_name="authors") # sheet_name is optional in this case
print(authors_df["author_link"])
'''
0 https://scholar.google.com/citations?hl=en&use...
1 https://scholar.google.com/citations?hl=en&use...
2 https://scholar.google.com/citations?hl=en&use...
3 https://scholar.google.com/citations?hl=en&use...
4 https://scholar.google.com/citations?hl=en&use...
Name: author_link, dtype: object
'''
print(authors_df)
'''
author_name author_link
0 Masatoshi Nei https://scholar.google.com/citations?hl=en&use...
1 Ulrich K. Laemmli https://scholar.google.com/citations?hl=en&use...
2 Gene Myers https://scholar.google.com/citations?hl=en&use...
3 Sudhir Kumar https://scholar.google.com/citations?hl=en&use...
4 Irving Weissman https://scholar.google.com/citations?hl=en&use...
'''
要从多个作者那里抓取,您可以使用 for
循环迭代 ["author_link"]
并使用 beautifulsoup
, lxml
, and requests
库提取所需数据。
Code and full example in the online IDE:
from bs4 import BeautifulSoup
import requests, lxml
import pandas as pd
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582",
}
# https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html
authors_df = pd.read_excel("google_scholar_scrape_multiple_authors.xlsx", sheet_name="authors") # sheet_name is optional in this case
# to_list() returns a list of author links so we can iterate over them
for author_link in authors_df["author_link"].to_list():
html = requests.get(author_link, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
print(f"Currently extracting: {soup.select_one('#gsc_prf_in').text}")
author_email = soup.select_one("#gsc_prf_ivh").text
author_image = f'https://scholar.google.com{soup.select_one("#gsc_prf_pup-img")["src"]}'
print(author_image, f"Author email: {author_email}", sep="\n")
# iterating over container with all needed data by accessing the right CSS selector
# have a look at SelectorGadget Chrome extension to easily grab CSS selectors
for article in soup.select("#gsc_a_b .gsc_a_t"):
article_title = article.select_one(".gsc_a_at").text
article_link = f'https://scholar.google.com{article.select_one(".gsc_a_at")["href"]}'
article_authors = article.select_one(".gsc_a_at+ .gs_gray").text
article_publication = article.select_one(".gs_gray+ .gs_gray").text
print(article_title, article_link, article_authors, article_publication, sep="\n")
print("-" * 15)
# part of the output:
'''
Currently extracting: Masatoshi Nei
https://scholar.google.comhttps://scholar.googleusercontent.com/citations?view_op=view_photo&user=VxOmZDgAAAAJ&citpid=3
Author email: Verified email at temple.edu
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:u5HHmVD_uO8C
N Saitou, M Nei
Molecular biology and evolution 4 (4), 406-425, 1987
... other results
---------------
Currently extracting: Irving Weissman
https://scholar.google.com/citations/images/avatar_scholar_128.png
Author email: Verified email at stanford.edu
Stem cells, cancer, and cancer stem cells
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=Y66bJgUAAAAJ&citation_for_view=Y66bJgUAAAAJ:u5HHmVD_uO8C
T Reya, SJ Morrison, MF Clarke, IL Weissman
nature 414 (6859), 105-111, 2001
'''
或者,您可以使用 SeprApi 中的 Google Scholar Author API 实现相同的目的。这是付费 API 和免费计划。
本质上,你只需要从接收到的字典中抓取你想要的数据,而不需要弄清楚要使用什么选择器来抓取正确的数据,如何绕过 Google 的块,如何增加请求数量。
要集成的示例代码:
import re
import pandas as pd
from serpapi import GoogleSearch
authors_df = pd.read_excel("google_scholar_scrape_multiple_authors.xlsx", sheet_name="authors") # sheet_name is optional in this case
for author in authors_df["author_link"].to_list():
params = {
"api_key": "YOUR_API_KEY",
"engine": "google_scholar_author",
"hl": "en",
# using basic regular expression to grab user ID from the passed URL
"author_id": re.search(r"user=(.*)", author).group(1) # -> VxOmZDgAAAAJ, unique author ID from the URL
}
search = GoogleSearch(params)
results = search.get_dict()
print(f"Extracting data from: {results['author']['name']}\n"
f"Author info: {results['author']}\n\n"
f"Author articles:\n{results['articles']}\n")
# part of the output:
'''
Extracting data from: Masatoshi Nei
Author info: {'name': 'Masatoshi Nei', 'affiliations': 'Laura Carnell Professor of Biology, Temple University', 'email': 'Verified email at temple.edu', 'interests': [{'title': 'Evolution', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:evolution', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolution'}, {'title': 'Evolutionary biology', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:evolutionary_biology', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolutionary_biology'}, {'title': 'Molecular evolution', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:molecular_evolution', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amolecular_evolution'}, {'title': 'Population genetics', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:population_genetics', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Apopulation_genetics'}, {'title': 'Phylogenetics', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:phylogenetics', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aphylogenetics'}], 'thumbnail': 'https://scholar.googleusercontent.com/citations?view_op=view_photo&user=VxOmZDgAAAAJ&citpid=3'}
Author articles:
[{'title': 'The neighbor-joining method: a new method for reconstructing phylogenetic trees.', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:u5HHmVD_uO8C', 'citation_id': 'VxOmZDgAAAAJ:u5HHmVD_uO8C', 'authors': 'N Saitou, M Nei', 'publication': 'Molecular biology and evolution 4 (4), 406-425, 1987', 'cited_by': {'value': 64841, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=7672721046330422437,346314157833338191', 'serpapi_link': 'https://serpapi.com/search.json?cites=7672721046330422437%2C346314157833338191&engine=google_scholar&hl=en', 'cites_id': '7672721046330422437,346314157833338191'}, 'year': '1987'}, {'title': 'MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:Tyk-4Ss8FVUC', 'citation_id': 'VxOmZDgAAAAJ:Tyk-4Ss8FVUC', 'authors': 'K Tamura, D Peterson, N Peterson, G Stecher, M Nei, S Kumar', 'publication': 'Molecular biology and evolution 28 (10), 2731-2739, 2011', 'cited_by': {'value': 44316, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5624029996178252455,5910675136328950108,13537318717249213469', 'serpapi_link': 'https://serpapi.com/search.json?cites=5624029996178252455%2C5910675136328950108%2C13537318717249213469&engine=google_scholar&hl=en', 'cites_id': '5624029996178252455,5910675136328950108,13537318717249213469'}, 'year': '2011'}, {'title': 'MEGA6: molecular evolutionary genetics analysis version 6.0', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:qmtmRrLr0tkC', 'citation_id': 'VxOmZDgAAAAJ:qmtmRrLr0tkC', 'authors': 'K Tamura, G Stecher, D Peterson, A Filipski, S Kumar', 'publication': 'Molecular biology and evolution 30 (12), 2725-2729, 2013', 'cited_by': {'value': 40558, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5258359186493639031', 'serpapi_link': 'https://serpapi.com/search.json?cites=5258359186493639031&engine=google_scholar&hl=en', 'cites_id': '5258359186493639031'}, 'year': '2013'}, {'title': 'MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:u-x6o8ySG0sC', 'citation_id': 'VxOmZDgAAAAJ:u-x6o8ySG0sC', 'authors': 'K Tamura, J Dudley, M Nei, S Kumar', 'publication': 'Molecular biology and evolution 24 (8), 1596-1599, 2007', 'cited_by': {'value': 34245, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=8480751610153565117', 'serpapi_link': 'https://serpapi.com/search.json?cites=8480751610153565117&engine=google_scholar&hl=en', 'cites_id': '8480751610153565117'}, 'year': '2007'}, {'title': 'Molecular evolutionary genetics', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:d1gkVwhDpl0C', 'citation_id': 'VxOmZDgAAAAJ:d1gkVwhDpl0C', 'authors': 'M Nei', 'publication': 'Columbia university press, 1987', 'cited_by': {'value': 20704, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=7660515423132980153', 'serpapi_link': 'https://serpapi.com/search.json?cites=7660515423132980153&engine=google_scholar&hl=en', 'cites_id': '7660515423132980153'}, 'year': '1987'}, {'title': 'MEGA2: molecular evolutionary genetics analysis software', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:IjCSPb-OGe4C', 'citation_id': 'VxOmZDgAAAAJ:IjCSPb-OGe4C', 'authors': 'S Kumar, K Tamura, IB Jakobsen, M Nei', 'publication': 'Bioinformatics 17 (12), 1244-1245, 2001', 'cited_by': {'value': 16078, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=14171206204658643394,531426008085525562,5869149036159079676,8067244568899724142,12929609819447339488,15783386726452728786', 'serpapi_link': 'https://serpapi.com/search.json?cites=14171206204658643394%2C531426008085525562%2C5869149036159079676%2C8067244568899724142%2C12929609819447339488%2C15783386726452728786&engine=google_scholar&hl=en', 'cites_id': '14171206204658643394,531426008085525562,5869149036159079676,8067244568899724142,12929609819447339488,15783386726452728786'}, 'year': '2001'}, {'title': 'Estimation of average heterozygosity and genetic distance from a small number of individuals', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:2osOgNQ5qMEC', 'citation_id': 'VxOmZDgAAAAJ:2osOgNQ5qMEC', 'authors': 'M Nei', 'publication': 'Genetics 89 (3), 583-590, 1978', 'cited_by': {'value': 14504, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=11038674224870321151', 'serpapi_link': 'https://serpapi.com/search.json?cites=11038674224870321151&engine=google_scholar&hl=en', 'cites_id': '11038674224870321151'}, 'year': '1978'}, {'title': 'MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:9yKSN-GCB0IC', 'citation_id': 'VxOmZDgAAAAJ:9yKSN-GCB0IC', 'authors': 'S Kumar, K Tamura, M Nei', 'publication': 'Briefings in bioinformatics 5 (2), 150-163, 2004', 'cited_by': {'value': 14403, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=10013295782066828040,15148316572039251274', 'serpapi_link': 'https://serpapi.com/search.json?cites=10013295782066828040%2C15148316572039251274&engine=google_scholar&hl=en', 'cites_id': '10013295782066828040,15148316572039251274'}, 'year': '2004'}, {'title': 'Mathematical model for studying genetic variation in terms of restriction endonucleases', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:qjMakFHDy7sC', 'citation_id': 'VxOmZDgAAAAJ:qjMakFHDy7sC', 'authors': 'M Nei, WH Li', 'publication': 'Proceedings of the National Academy of Sciences 76 (10), 5269-5273, 1979', 'cited_by': {'value': 13619, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5179626164554275201,1942230974501958280', 'serpapi_link': 'https://serpapi.com/search.json?cites=5179626164554275201%2C1942230974501958280&engine=google_scholar&hl=en', 'cites_id': '5179626164554275201,1942230974501958280'}, 'year': '1979'}, {'title': 'Genetic distance between populations', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:UeHWp8X0CEIC', 'citation_id': 'VxOmZDgAAAAJ:UeHWp8X0CEIC', 'authors': 'M Nei', 'publication': 'The American Naturalist 106 (949), 283-292, 1972', 'cited_by': {'value': 12980, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=4154924214026252226,7115074001272219295', 'serpapi_link': 'https://serpapi.com/search.json?cites=4154924214026252226%2C7115074001272219295&engine=google_scholar&hl=en', 'cites_id': '4154924214026252226,7115074001272219295'}, 'year': '1972'}, {'title': 'Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:Y0pCki6q_DkC', 'citation_id': 'VxOmZDgAAAAJ:Y0pCki6q_DkC', 'authors': 'K Tamura, M Nei', 'publication': 'Molecular biology and evolution 10 (3), 512-526, 1993', 'cited_by': {'value': 11093, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=13509507708085673250', 'serpapi_link': 'https://serpapi.com/search.json?cites=13509507708085673250&engine=google_scholar&hl=en', 'cites_id': '13509507708085673250'}, 'year': '1993'}, {'title': 'Analysis of gene diversity in subdivided populations', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:zYLM7Y9cAGgC', 'citation_id': 'VxOmZDgAAAAJ:zYLM7Y9cAGgC', 'authors': 'M Nei', 'publication': 'Proceedings of the national academy of sciences 70 (12), 3321-3323, 1973', 'cited_by': {'value': 10714, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=11712109356391350421', 'serpapi_link': 'https://serpapi.com/search.json?cites=11712109356391350421&engine=google_scholar&hl=en', 'cites_id': '11712109356391350421'}, 'year': '1973'}, {'title': 'Molecular evolution and phylogenetics', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:YsMSGLbcyi4C', 'citation_id': 'VxOmZDgAAAAJ:YsMSGLbcyi4C', 'authors': 'M Nei, S Kumar', 'publication': 'Oxford University Press, USA, 2000', 'cited_by': {'value': 8795, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=703217195301701212,1351927951694611906', 'serpapi_link': 'https://serpapi.com/search.json?cites=703217195301701212%2C1351927951694611906&engine=google_scholar&hl=en', 'cites_id': '703217195301701212,1351927951694611906'}, 'year': '2000'}, {'title': 'Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions.', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:W7OEmFMy1HYC', 'citation_id': 'VxOmZDgAAAAJ:W7OEmFMy1HYC', 'authors': 'M Nei, T Gojobori', 'publication': 'Molecular biology and evolution 3 (5), 418-426, 1986', 'cited_by': {'value': 5279, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=12106160511321461626', 'serpapi_link': 'https://serpapi.com/search.json?cites=12106160511321461626&engine=google_scholar&hl=en', 'cites_id': '12106160511321461626'}, 'year': '1986'}, {'title': 'Prospects for inferring very large phylogenies by using the neighbor-joining method', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:0EnyYjriUFMC', 'citation_id': 'VxOmZDgAAAAJ:0EnyYjriUFMC', 'authors': 'K Tamura, M Nei, S Kumar', 'publication': 'Proceedings of the National Academy of Sciences 101 (30), 11030-11035, 2004', 'cited_by': {'value': 4882, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=9650987578903829104', 'serpapi_link': 'https://serpapi.com/search.json?cites=9650987578903829104&engine=google_scholar&hl=en', 'cites_id': '9650987578903829104'}, 'year': '2004'}, {'title': 'The bottleneck effect and genetic variability in populations', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:eQOLeE2rZwMC', 'citation_id': 'VxOmZDgAAAAJ:eQOLeE2rZwMC', 'authors': 'M Nei, T Maruyama, R Chakraborty', 'publication': 'Evolution, 1-10, 1975', 'cited_by': {'value': 3906, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=13149273985544466189', 'serpapi_link': 'https://serpapi.com/search.json?cites=13149273985544466189&engine=google_scholar&hl=en', 'cites_id': '13149273985544466189'}, 'year': '1975'}, {'title': 'Accuracy of estimated phylogenetic trees from molecular data', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:Se3iqnhoufwC', 'citation_id': 'VxOmZDgAAAAJ:Se3iqnhoufwC', 'authors': 'M Nei, F Tajima, Y Tateno', 'publication': 'Journal of molecular evolution 19 (2), 153-170, 1983', 'cited_by': {'value': 2877, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=10638180566709737898', 'serpapi_link': 'https://serpapi.com/search.json?cites=10638180566709737898&engine=google_scholar&hl=en', 'cites_id': '10638180566709737898'}, 'year': '1983'}, {'title': 'Molecular population genetics and evolution.', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:WF5omc3nYNoC', 'citation_id': 'VxOmZDgAAAAJ:WF5omc3nYNoC', 'authors': 'M Nei', 'publication': 'Molecular population genetics and evolution., 1975', 'cited_by': {'value': 2795, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=7886550802885580479', 'serpapi_link': 'https://serpapi.com/search.json?cites=7886550802885580479&engine=google_scholar&hl=en', 'cites_id': '7886550802885580479'}, 'year': '1975'}, {'title': 'Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:roLk4NBRz8UC', 'citation_id': 'VxOmZDgAAAAJ:roLk4NBRz8UC', 'authors': 'AL Hughes, M Nei', 'publication': 'Nature 335 (6186), 167-170, 1988', 'cited_by': {'value': 2169, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=2966744676732650646', 'serpapi_link': 'https://serpapi.com/search.json?cites=2966744676732650646&engine=google_scholar&hl=en', 'cites_id': '2966744676732650646'}, 'year': '1988'}, {'title': 'Sampling variances of heterozygosity and genetic distance', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:UebtZRa9Y70C', 'citation_id': 'VxOmZDgAAAAJ:UebtZRa9Y70C', 'authors': 'M Nei, AK Roychoudhury', 'publication': 'Genetics 76 (2), 379-390, 1974', 'cited_by': {'value': 1918, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5978996318059495400', 'serpapi_link': 'https://serpapi.com/search.json?cites=5978996318059495400&engine=google_scholar&hl=en', 'cites_id': '5978996318059495400'}, 'year': '1974'}]
'''
Disclaimer, I work for SerpApi.