在 R 中计算学术 "g-index"(h 指数的变体)?
Calculating the academic "g-index" (a variant of the h-index) in R?
此数据框显示了两位研究人员及其每篇论文的引用次数:
researcher citations
<chr> <dbl>
1 Berger 8
2 Berger 11
3 Berger 26
4 Berger 25
5 Berger 10
6 Meyer 45
7 Meyer 12
8 Meyer 12
9 Meyer 8
10 Meyer 21
如何计算每个研究人员在 R 中的“g 指数”?
这是Wikipedia definition of the g-index:
The index is calculated based on the distribution of citations received by a given researcher's publications, such that given a set of articles ranked in decreasing order of the number of citations that they received, the g-index is the unique largest number such that the top g articles received together at least g2 citations. Hence, a g-index of 10 indicates that the top 10 publications of an author have been cited at least 100 times (102), a g-index of 20 indicates that the top 20 publications of an author have been cited 400 times (202).
数据框:
structure(list(researcher = c("Berger", "Berger", "Berger", "Berger",
"Berger", "Meyer", "Meyer", "Meyer", "Meyer", "Meyer"), citations = c(8,
11, 26, 25, 10, 45, 12, 12, 8, 21)), row.names = c(NA, -10L), groups = structure(list(
researcher = c("Berger", "Meyer"), .rows = structure(list(
1:5, 6:10), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
与data.table
:
setorder(dt, researcher, -citations)
dtg <- dt[, .(gscore = max((1:.N)*(cumsum(citations) > (1:.N)))), by = "researcher"]
dtg
#> researcher gscore
#> 1: Berger 5
#> 2: Meyer 5
此数据框显示了两位研究人员及其每篇论文的引用次数:
researcher citations
<chr> <dbl>
1 Berger 8
2 Berger 11
3 Berger 26
4 Berger 25
5 Berger 10
6 Meyer 45
7 Meyer 12
8 Meyer 12
9 Meyer 8
10 Meyer 21
如何计算每个研究人员在 R 中的“g 指数”?
这是Wikipedia definition of the g-index:
The index is calculated based on the distribution of citations received by a given researcher's publications, such that given a set of articles ranked in decreasing order of the number of citations that they received, the g-index is the unique largest number such that the top g articles received together at least g2 citations. Hence, a g-index of 10 indicates that the top 10 publications of an author have been cited at least 100 times (102), a g-index of 20 indicates that the top 20 publications of an author have been cited 400 times (202).
数据框:
structure(list(researcher = c("Berger", "Berger", "Berger", "Berger",
"Berger", "Meyer", "Meyer", "Meyer", "Meyer", "Meyer"), citations = c(8,
11, 26, 25, 10, 45, 12, 12, 8, 21)), row.names = c(NA, -10L), groups = structure(list(
researcher = c("Berger", "Meyer"), .rows = structure(list(
1:5, 6:10), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
与data.table
:
setorder(dt, researcher, -citations)
dtg <- dt[, .(gscore = max((1:.N)*(cumsum(citations) > (1:.N)))), by = "researcher"]
dtg
#> researcher gscore
#> 1: Berger 5
#> 2: Meyer 5