在管道中的同一对象上调用两个不同的函数 (%>%)
call two different functions on same object in a pipe (%>%)
我想知道是否有办法同时调用 html_name()
和 html_text
(来自 rvest
包)并将两个不同的结果存储在同一管道内(magrittr::%>%
)
这是一个例子:
uniprot_ac <- "P31374"
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text')
此时我想从 html_name()
中获取两个标签名称
[1] "fullname" "ecnumber" "name" "text"
AND 标签内容,无需通过重写整个管道来创建单独的对象,只需将最后一行更改为 html_text()
[1] "Serine/threonine-protein kinase PSK1"
[2] "2.7.11.1"
[3] "PSK1"
[4] "Serine/threonine-protein kinase involved ... ...
期望的输出可以是这样的,向量或data.frame都没有关系
[1] fullname: "Serine/threonine-protein kinase PSK1"
[2] ecnumber: "2.7.11.1"
[3] Name: "PSK1"
[4] Text: "Serine/threonine-protein kinase involved ... ...
可能有点 hack,但您可以在管道中使用带括号的匿名函数:
library("magrittr")
library("httr")
library("xml2")
library("rvest")
uniprot_ac <- "P31374"
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>%
(function(x) list(name = html_name(x), text = html_text(x)))
#$name
#[1] "fullname" "ecnumber" "name" "text"
#
#$text
#[1] "Serine/threonine-protein kinase PSK1"
#[2] "2.7.11.1"
#[3] "PSK1"
#[4] "Serine/threonine-protein kinase involved in the control of sugar metabolism and translation. Phosphorylates UGP1, which is required for normal glycogen and beta-(1,6)-glucan synthesis. This phosphorylation shifts glucose partitioning toward cell wall glucan synthesis at the expense of glycogen synthesis."
或者,您可以使用 purrr
包做一些更优雅的事情,但我看不出您为什么要为此加载整个包。
编辑
正如@MrFlick 在评论中指出的那样,如果将点 (.
) 正确放入大括号中,占位符可以做同样的事情。
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>%
{list(name = html_name(.), text = html_text(.))}
这可以说是更 magrittr 惯用的方式,它 是 实际上记录在 help("%>%")
.
您可以制作一个自定义函数,接收您的 html_nodes
对象并对其执行任何所需的操作:
html_name_text <- function(nodes) {
list(html_name(nodes), html_text(nodes))
}
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>%
html_name_text()
[[1]]
[1] "fullname" "ecnumber" "name" "text"
[[2]]
[1] "Serine/threonine-protein kinase PSK1"
[2] "2.7.11.1"
[3] "PSK1"
[4] "Serine/threonine-protein kinase involved in the control of sugar metabolism and translation. Phosphorylates UGP1, which is required for normal glycogen and beta-(1,6)-glucan synthesis. This phosphorylation shifts glucose partitioning toward cell wall glucan synthesis at the expense of glycogen synthesis."
这是一个 purrr
方法 returns tibble
:
library(tidyverse)
library(rvest)
uniprot_ac <- "P31374"
read_html(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>%
map(~ list(name = html_name(.), text = html_text(.))) %>%
bind_rows
#> # A tibble: 4 x 2
#> name text
#> <chr> <chr>
#> 1 fullname Serine/threonine-protein kinase PSK1
#> 2 ecnumber 2.7.11.1
#> 3 name PSK1
#> 4 text Serine/threonine-protein kinase involved in the control of suga~
由 reprex package (v0.2.1)
于 2019-03-26 创建
一个选项是在管道后面使用括号,将当前结果存储在临时对象中(如果需要),然后计算您想要的不同结果:
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>% {
list(name = html_name(.), text = html_text(.))
}
仅供参考,有时您需要传递一个临时对象,如本例所示:
iris %>%
select(Sepal.Length, Sepal.Width) %>% {
temp <- .
bind_rows(temp %>% filter(Sepal.Length > 5),
temp %>% filter(Sepal.Width <= 3))
} %>%
dim()
这种情况下直接把temp
换成.
是不行的
没有额外的包,也没有太多的括号和点游戏你可以做:
nodes %>% lapply(list(html_name, html_text), function(x,y) x(y), .)
# [[1]]
# [1] "fullname" "ecnumber" "name" "text"
#
# [[2]]
# [1] "Serine/threonine-protein kinase PSK1"
# [2] "2.7.11.1"
# [3] "PSK1"
# [4] "Serine/threonine-protein kinase involved in the control of sugar
或以下,稍微紧凑但带有大括号:
nodes %>% {lapply(list(html_name, html_text), do.call, list(.))}
虽然我会使用 purrr
并循环函数并将这些函数连同 .
作为参数传递给 exec
:
library(purrr)
nodes %>% map(list(html_name, html_text), exec, .)
(相同的输出)
数据
library("magrittr")
library("httr")
library("xml2")
library("rvest")
nodes <- GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text')
我想知道是否有办法同时调用 html_name()
和 html_text
(来自 rvest
包)并将两个不同的结果存储在同一管道内(magrittr::%>%
)
这是一个例子:
uniprot_ac <- "P31374"
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text')
此时我想从 html_name()
[1] "fullname" "ecnumber" "name" "text"
AND 标签内容,无需通过重写整个管道来创建单独的对象,只需将最后一行更改为 html_text()
[1] "Serine/threonine-protein kinase PSK1"
[2] "2.7.11.1"
[3] "PSK1"
[4] "Serine/threonine-protein kinase involved ... ...
期望的输出可以是这样的,向量或data.frame都没有关系
[1] fullname: "Serine/threonine-protein kinase PSK1"
[2] ecnumber: "2.7.11.1"
[3] Name: "PSK1"
[4] Text: "Serine/threonine-protein kinase involved ... ...
可能有点 hack,但您可以在管道中使用带括号的匿名函数:
library("magrittr")
library("httr")
library("xml2")
library("rvest")
uniprot_ac <- "P31374"
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>%
(function(x) list(name = html_name(x), text = html_text(x)))
#$name
#[1] "fullname" "ecnumber" "name" "text"
#
#$text
#[1] "Serine/threonine-protein kinase PSK1"
#[2] "2.7.11.1"
#[3] "PSK1"
#[4] "Serine/threonine-protein kinase involved in the control of sugar metabolism and translation. Phosphorylates UGP1, which is required for normal glycogen and beta-(1,6)-glucan synthesis. This phosphorylation shifts glucose partitioning toward cell wall glucan synthesis at the expense of glycogen synthesis."
或者,您可以使用 purrr
包做一些更优雅的事情,但我看不出您为什么要为此加载整个包。
编辑
正如@MrFlick 在评论中指出的那样,如果将点 (.
) 正确放入大括号中,占位符可以做同样的事情。
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>%
{list(name = html_name(.), text = html_text(.))}
这可以说是更 magrittr 惯用的方式,它 是 实际上记录在 help("%>%")
.
您可以制作一个自定义函数,接收您的 html_nodes
对象并对其执行任何所需的操作:
html_name_text <- function(nodes) {
list(html_name(nodes), html_text(nodes))
}
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>%
html_name_text()
[[1]]
[1] "fullname" "ecnumber" "name" "text"
[[2]]
[1] "Serine/threonine-protein kinase PSK1"
[2] "2.7.11.1"
[3] "PSK1"
[4] "Serine/threonine-protein kinase involved in the control of sugar metabolism and translation. Phosphorylates UGP1, which is required for normal glycogen and beta-(1,6)-glucan synthesis. This phosphorylation shifts glucose partitioning toward cell wall glucan synthesis at the expense of glycogen synthesis."
这是一个 purrr
方法 returns tibble
:
library(tidyverse)
library(rvest)
uniprot_ac <- "P31374"
read_html(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>%
map(~ list(name = html_name(.), text = html_text(.))) %>%
bind_rows
#> # A tibble: 4 x 2
#> name text
#> <chr> <chr>
#> 1 fullname Serine/threonine-protein kinase PSK1
#> 2 ecnumber 2.7.11.1
#> 3 name PSK1
#> 4 text Serine/threonine-protein kinase involved in the control of suga~
由 reprex package (v0.2.1)
于 2019-03-26 创建一个选项是在管道后面使用括号,将当前结果存储在临时对象中(如果需要),然后计算您想要的不同结果:
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>% {
list(name = html_name(.), text = html_text(.))
}
仅供参考,有时您需要传递一个临时对象,如本例所示:
iris %>%
select(Sepal.Length, Sepal.Width) %>% {
temp <- .
bind_rows(temp %>% filter(Sepal.Length > 5),
temp %>% filter(Sepal.Width <= 3))
} %>%
dim()
这种情况下直接把temp
换成.
是不行的
没有额外的包,也没有太多的括号和点游戏你可以做:
nodes %>% lapply(list(html_name, html_text), function(x,y) x(y), .)
# [[1]]
# [1] "fullname" "ecnumber" "name" "text"
#
# [[2]]
# [1] "Serine/threonine-protein kinase PSK1"
# [2] "2.7.11.1"
# [3] "PSK1"
# [4] "Serine/threonine-protein kinase involved in the control of sugar
或以下,稍微紧凑但带有大括号:
nodes %>% {lapply(list(html_name, html_text), do.call, list(.))}
虽然我会使用 purrr
并循环函数并将这些函数连同 .
作为参数传递给 exec
:
library(purrr)
nodes %>% map(list(html_name, html_text), exec, .)
(相同的输出)
数据
library("magrittr")
library("httr")
library("xml2")
library("rvest")
nodes <- GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text')