网络抓取:如何在 HTML 节点中包含引号字符
web-scraping: how to include quote character in HTML node
我正在使用 rvest
包从网站上抓取信息。我需要的一些资料属于classiinfo"
。不幸的是,如果我在函数 html_nodes()
中使用这个字符串,我会收到以下错误:
Error in parse_simple_selector(stream) :
Expected selector, got <STRING '' at 7>
这是一个代表:
library(rvest)
library(xml2)
webpage <- read_html(x = paste0("https://www.gstsvs.ch/fr/trouver-un-medecin-veterinaire.html?tx_datapool_pi1%5Bhauptgebiet%5D=3&tx_datapool_pi1%5Bmapsearch%5D=cercare&tx_datapool_pi1%5BdoSearch%5D=1&tx_datapool_pi1%5Bpointer2303%5D=",
0))
webpage_address <- webpage %>%
html_nodes('.iinfo"') %>%
html_text() %>%
gsub(pattern = "\r|\t|\n",
replacement = " ")
那class指的是网站每个方框内列出的地址。如果在浏览器中检查网页结构并导航到该框,则可以检索此信息。如果你这样做,当你用鼠标select地址分割时,你会看到一个带有div.iinfo\"
的标志出现。
非常感谢您的帮助!
这里:
webpage_address <- webpage %>%
html_nodes(xpath = "//*[@class='iinfo\"']") %>%
html_text(trim = T)
结果:
> webpage_address
[1] "Anne-Françoise HenchozEnvers 412400 Le Locle, NE"
[2] "Téléphone: 032 931 10 10Urgences: 032 931 10 10Fax: 032 931 36 10afhenchoz(at)bluewin.chafhenchoz.com"
[3] "Ursi Dommann ScheuberHauptstrasse 156222 Gunzwil, LU"
[4] "Téléphone: 041 930 14 44tiergesundheit(at)bluewin.ch"
[5] "Dr. Med. Vet. Anne KramerBaggwilgraben 33267 Seedorf, BE"
[6] "Téléphone: 079 154 70 15anne(at)alpakavet.chwww.alpakavet.ch"
[7] "Dr. med. vet. Andrea FeistAdelbodenstrasse 103714 Frutigen, BE"
[8] "Téléphone: 033 671 15 60Urgences: 033 671 15 60Fax: 033 671 86 60alpinvet(at)bluewin.chwww.alpinvet.ch"
[9] "Dr. med. vet. Peter KürsteinerAlpsteinstr. 289240 Uzwil, SG"
[10] "Téléphone: 071 951 85 44"
[11] "Kathrin Urscheler-Hollenstein, Eveline Muhl-ZollingerSchaffhauserstrasse 2458222 Beringen, SH"
[12] "Téléphone: 052 685 20 20Fax: 052 685 34 20praxis(at)tieraerzte-team.chwww.tieraerzte-team.ch"
[13] "Dr. med. vet. Erwin VincenzVia Santeri 127130 Ilanz, GR"
[14] "Téléphone: 081/925 23 23Urgences: 081/925 23 23Fax: 081/925 23 25info(at)anima-veterinari.ch"
[15] "Dr. Zlatko MarinovicMühlerain 3853855072 oeschgen, AG"
[16] "Téléphone: 49628715060Urgences: 49628715060Fax: 49628712439z.marin(at)sunrise.ch"
[17] "Manser ChläusSchwalbenweg 73186 Düdingen, FR"
[18] "Téléphone: 026 493 10 60animans.tierarzt(at)gmail.com"
[19] "W.A.GeesBrünigstrasse 38aHauptstrasse 100, 3855 Brienz3860 Meiringen, BE"
[20] "Téléphone: 033 / 971 60 42Urgences: 033 / 971 60 42Fax: 033 / 971 01 50info(at)tierarzt-meiringen.chanisano.ch"
我正在使用 rvest
包从网站上抓取信息。我需要的一些资料属于classiinfo"
。不幸的是,如果我在函数 html_nodes()
中使用这个字符串,我会收到以下错误:
Error in parse_simple_selector(stream) :
Expected selector, got <STRING '' at 7>
这是一个代表:
library(rvest)
library(xml2)
webpage <- read_html(x = paste0("https://www.gstsvs.ch/fr/trouver-un-medecin-veterinaire.html?tx_datapool_pi1%5Bhauptgebiet%5D=3&tx_datapool_pi1%5Bmapsearch%5D=cercare&tx_datapool_pi1%5BdoSearch%5D=1&tx_datapool_pi1%5Bpointer2303%5D=",
0))
webpage_address <- webpage %>%
html_nodes('.iinfo"') %>%
html_text() %>%
gsub(pattern = "\r|\t|\n",
replacement = " ")
那class指的是网站每个方框内列出的地址。如果在浏览器中检查网页结构并导航到该框,则可以检索此信息。如果你这样做,当你用鼠标select地址分割时,你会看到一个带有div.iinfo\"
的标志出现。
非常感谢您的帮助!
这里:
webpage_address <- webpage %>%
html_nodes(xpath = "//*[@class='iinfo\"']") %>%
html_text(trim = T)
结果:
> webpage_address
[1] "Anne-Françoise HenchozEnvers 412400 Le Locle, NE"
[2] "Téléphone: 032 931 10 10Urgences: 032 931 10 10Fax: 032 931 36 10afhenchoz(at)bluewin.chafhenchoz.com"
[3] "Ursi Dommann ScheuberHauptstrasse 156222 Gunzwil, LU"
[4] "Téléphone: 041 930 14 44tiergesundheit(at)bluewin.ch"
[5] "Dr. Med. Vet. Anne KramerBaggwilgraben 33267 Seedorf, BE"
[6] "Téléphone: 079 154 70 15anne(at)alpakavet.chwww.alpakavet.ch"
[7] "Dr. med. vet. Andrea FeistAdelbodenstrasse 103714 Frutigen, BE"
[8] "Téléphone: 033 671 15 60Urgences: 033 671 15 60Fax: 033 671 86 60alpinvet(at)bluewin.chwww.alpinvet.ch"
[9] "Dr. med. vet. Peter KürsteinerAlpsteinstr. 289240 Uzwil, SG"
[10] "Téléphone: 071 951 85 44"
[11] "Kathrin Urscheler-Hollenstein, Eveline Muhl-ZollingerSchaffhauserstrasse 2458222 Beringen, SH"
[12] "Téléphone: 052 685 20 20Fax: 052 685 34 20praxis(at)tieraerzte-team.chwww.tieraerzte-team.ch"
[13] "Dr. med. vet. Erwin VincenzVia Santeri 127130 Ilanz, GR"
[14] "Téléphone: 081/925 23 23Urgences: 081/925 23 23Fax: 081/925 23 25info(at)anima-veterinari.ch"
[15] "Dr. Zlatko MarinovicMühlerain 3853855072 oeschgen, AG"
[16] "Téléphone: 49628715060Urgences: 49628715060Fax: 49628712439z.marin(at)sunrise.ch"
[17] "Manser ChläusSchwalbenweg 73186 Düdingen, FR"
[18] "Téléphone: 026 493 10 60animans.tierarzt(at)gmail.com"
[19] "W.A.GeesBrünigstrasse 38aHauptstrasse 100, 3855 Brienz3860 Meiringen, BE"
[20] "Téléphone: 033 / 971 60 42Urgences: 033 / 971 60 42Fax: 033 / 971 01 50info(at)tierarzt-meiringen.chanisano.ch"