使用 rvest 进行网络抓取,错误 "no applicable method for 'xml_find_first' applied to an object of class " 个字符“”

Webscraping with rvest, error "no applicable method for 'xml_find_first' applied to an object of class "character""

我正在尝试使用 rvest 包从网页中抓取职位名称,但出现错误:

Error in UseMethod("xml_find_first") : 
  no applicable method for 'xml_find_first' applied to an object of class "character" 

有什么建议吗?我是否遗漏了部分代码?我的代码如下:

library(dplyr)
library(rvest)
library(stringr)

url <- "https://www.cvmarket.lt/darbo-skelbimai"
# save the url
html <- read_html(url) # read the url 

get_links <- function(html) {
  html %>%
    html_nodes(xpath = '//*[contains(concat( " ", @class, " " ), concat( " ", "limited-lines", " " ))]') %>%
    html_attr(name = "href")
}
# now we call the function and save it
links <- get_links(html)
links

links <- paste0("https://www.cvmarket.lt", links)

link <- links[1]
html <- read_html(link)


# position title
get_title <- function(html) {
  html %>%
    html_node(xpath = '//*[(@id = "main-job-title")]') %>%
    html_text() %>%
    unlist()
}
#test
get_title(link)

我会更改您的函数以接受 uri 作为输入参数。使用更快的 css 选择器和更具体的选择器,以便在使用 id 的情况下获得更快的匹配,并且在 css class 组合的情况下没有重复。您可以使用 url_absolute 在 get_links 函数中完成 urls。这也处理当前错误,您将 url 而不是 html 传递给 get_title 函数,然后在其上调用 read_html。

library(dplyr)
library(rvest)
library(stringr)

get_links <- function(url) {
  read_html(url) %>%
    html_nodes('.main-column > .f_job_title') %>%
    html_attr(name = "href") %>% url_absolute(url)
}

# position title
get_title <- function(link) {
  read_html(link) %>%
    html_node('#main-job-title') %>%
    html_text() 
}


url <- "https://www.cvmarket.lt/darbo-skelbimai"
links <- get_links(url)
link <- links[1]

#test
get_title(link)