在 R 中使用 url 的列表,如何通过网络抓取图像、下载文件并将图像分组回原始 url?
Using a list of urls in R, How to web scrape images, download the files and group the images back to original url?
我有一个向量 URLs
library(rvest)
URLs <-c("https://www.espn.com/f1/story/_/id/31287940/norris-made-step-says-mclaren",
"https://www.espn.com/f1/story/_/id/31287893/vettel-calls-fia-not-very-professional-imola-penalty",
"https://www.espn.com/f1/story/_/id/31284743/alonso-promoted-points-finish-raikkonen-penalty")
我想遍历这些并为该页面上的所有图片创建一个图像列表 link,着陆页是列表中元素的名称。然后,我希望下载图片时同时附上图片和着陆页 urls。
我目前的代码一次只能用于一页。
REPREX 的单个 URL
url <-c("https://www.espn.com/f1/story/_/id/31287940/norris-made-step-says-mclaren")
webpage <- html_session(url)
link.titles <- webpage %>% html_nodes("img")
img.url <- link.titles[2] %>% html_attr("src")
### Issue #1 i could not figure out the loop with html_attr to spit out all of the urls in a list###
download.file(img.url, "test.jpg", mode = "wb")
###Issue #2 because of this I cannot loop through a list and download the names###
一旦我能够下载图片列表,url 问题就可以轻松解决,方法是用定界符分隔着陆页和图片 url 来命名文件。我可以为每个 URL 创建一个 ID 以减少文件长度。命名约定,“LP1-ESPN.com.jpg”
这样做的目的是快速浏览每个link的照片,删除不相关的照片,统计每个link的照片数量,然后link统计(在手动删除不相关的照片后)和 links 返回到原始数据集,该数据集具有其他用于分析的指标。这就是为什么我想要上面的命名约定的原因,这样我就可以从 r 中的文件夹中加载剩余的名称和 links,而无需操作 jpg 文件。
编辑:我已经能够获得我所有 url 的列表以及其中的图像 link。我无法通过此循环下载它们。很多都丢失了。
以下代码有效,但只下载了大约 10% 的图像。我知道 html_session 可以解决这个问题,但是大约有 2500 张图像 link,我无法弄清楚如何循环处理会话。也许是一个 while 循环?
tryCatch(lapply(1:length(total_urls.2$V1), function(x)
download.file(new_df[[x]],paste0(total_urls.2[x,3],"_", total_urls.2[x,4],".jpeg"),method = "auto" ,mode = "wb", cacheOK = FALSE)), error = function(e) NULL)
这是有问题的 data.frame:它有 2380 行长,但这里被截断为 50 行。我想知道如何下载所有图片 links,现在我在文件夹中只有大约 19 张图片。
dput(total_urls.2[1:50,])
structure(list(V1 = c("https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/i/columnists/edmondson_laurence_m.jpg&h=80&w=80&scale=crop",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0418/r842329_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/i/columnists/edmondson_laurence_m.jpg&h=80&w=80&scale=crop",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0418/r842215_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/icons/in_15.png", "https://a.espncdn.com/icons/in_15.png",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0417/r841759_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0331/r834376_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0417/r841759_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0331/r834376_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/i/columnists/edmondson_laurence_m.jpg&h=80&w=80&scale=crop",
"https://a.espncdn.com/combiner/i?img=/i/columnists/saunders_nate_m.jpg&h=80&w=80&scale=crop",
"https://a.espncdn.com/icons/in_15.png", "https://a.espncdn.com/icons/in_15.png",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/i/columnists/edmondson_laurence_m.jpg&h=80&w=80&scale=crop",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0329/r833584_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0415/r840887_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/icons/in_15.png", "https://a.espncdn.com/icons/in_15.png",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0415/r840882_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0415/r840887_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/icons/in_15.png", "https://a.espncdn.com/icons/in_15.png",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0413/r839738_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0415/r840882_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0329/r833489_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://ca-times.brightspotcdn.com/dims4/default/331b349/2147483647/strip/true/crop/3937x2625+0+0/resize/840x560!/quality/90/?url=https%3A%2F%2Fcalifornia-times-brightspot.s3.amazonaws.com%2Fb3%2F61%2F0c10c4344ce0a69870cf8864cf98%2Fitaly-emilia-romagna-f1-gp-auto-racing-58053.jpg",
"//cdn.cnn.com/cnnnext/dam/assets/200809102318-verstappen-celeb-exlarge-169.jpg",
"//cdn.cnn.com/cnnnext/dam/assets/200809102318-verstappen-celeb-exlarge-169.jpg",
"//cdn.cnn.com/cnnnext/dam/assets/200809102318-verstappen-celeb-large-169.jpg",
"//cdn.cnn.com/cnnnext/dam/assets/210309145427-mick-and-michael-schumacher-large-169.jpg",
"//cdn.cnn.com/cnnnext/dam/assets/210305121558-stephanie-travers-lewis-hamilton-large-169.jpg"
), URLs = c("https://www.espn.com/f1/story/_/id/31289705/the-blame-game-analysis-bottas-russell-clash-happens-next1",
"https://www.espn.com/f1/story/_/id/31289705/the-blame-game-analysis-bottas-russell-clash-happens-next2",
"https://www.espn.com/f1/story/_/id/31289705/the-blame-game-analysis-bottas-russell-clash-happens-next3",
"https://www.espn.com/f1/story/_/id/31287940/norris-made-step-says-mclaren1",
"https://www.espn.com/f1/story/_/id/31287893/vettel-calls-fia-not-very-professional-imola-penalty1",
"https://www.espn.com/f1/story/_/id/31284743/alonso-promoted-points-finish-raikkonen-penalty1",
"https://www.espn.com/f1/story/_/id/31284241/russell-lost-sight-bigger-picture-bottas-clash1",
"https://www.espn.com/f1/story/_/id/31284241/russell-lost-sight-bigger-picture-bottas-clash2",
"https://www.espn.com/f1/story/_/id/31284241/russell-lost-sight-bigger-picture-bottas-clash3",
"https://www.espn.com/f1/story/_/id/31283343/russell-asked-bottas-trying-kill-us-both1",
"http://www.espn.com/espn/wire?section=rpm&id=312812861", "http://www.espn.com/espn/wire?section=rpm&id=312812862",
"http://www.espn.com/espn/wire?section=rpm&id=312812863", "https://www.espn.com/f1/story/_/id/31281017/f1-confirms-race-miami-20221",
"https://www.espn.com/f1/story/_/id/31281017/f1-confirms-race-miami-20222",
"https://www.espn.com/f1/story/_/id/31281017/f1-confirms-race-miami-20223",
"https://www.espn.com/f1/story/_/id/31281017/f1-hold-miami-grand-prix-2022-onwards1",
"https://www.espn.com/f1/story/_/id/31281017/f1-hold-miami-grand-prix-2022-onwards2",
"https://www.espn.com/f1/story/_/id/31281017/f1-hold-miami-grand-prix-2022-onwards3",
"https://www.espn.com/f1/story/_/id/31275927/how-hamilton-mercedes-turned-tables-red-bull1",
"https://www.espn.com/f1/story/_/id/31275927/how-hamilton-mercedes-turned-tables-red-bull2",
"https://www.espn.com/f1/story/_/id/31275927/how-hamilton-mercedes-turned-tables-red-bull3",
"http://www.espn.com/espn/wire?section=rpm&id=312742341", "http://www.espn.com/espn/wire?section=rpm&id=312742342",
"http://www.espn.com/espn/wire?section=rpm&id=312742343", "https://www.espn.com/f1/story/_/id/31270294/has-mercedes-regained-advantage-already-ferrari-spring-surprise1",
"https://www.espn.com/f1/story/_/id/31270294/has-mercedes-regained-advantage-already-ferrari-spring-surprise2",
"https://www.espn.com/f1/story/_/id/31270374/aston-martin-wants-aero-rules-revision1",
"https://www.espn.com/f1/story/_/id/31270374/aston-martin-wants-aero-rules-revision2",
"https://www.espn.com/f1/story/_/id/31270374/aston-martin-wants-aero-rules-revision3",
"http://www.espn.com/espn/wire?section=rpm&id=312676841", "http://www.espn.com/espn/wire?section=rpm&id=312676842",
"http://www.espn.com/espn/wire?section=rpm&id=312676843", "https://www.espn.com/f1/story/_/id/31263611/hamilton-says-f1-rivalry-vettel-remains-favourite1",
"https://www.espn.com/f1/story/_/id/31263611/hamilton-says-f1-rivalry-vettel-remains-favourite2",
"https://www.espn.com/f1/story/_/id/31263611/hamilton-says-f1-rivalry-vettel-remains-favourite3",
"http://www.espn.com/espn/wire?section=rpm&id=312634771", "http://www.espn.com/espn/wire?section=rpm&id=312634772",
"http://www.espn.com/espn/wire?section=rpm&id=312634773", "https://www.espn.com/f1/story/_/id/31263349/mercedes-gone-hunted-hunters1",
"https://www.espn.com/f1/story/_/id/31263349/mercedes-gone-hunted-hunters2",
"https://www.espn.com/f1/story/_/id/31263349/mercedes-gone-hunted-hunters3",
"https://www.espn.com/f1/story/_/id/31263278/verstappen-calls-clarity-messy-track-limits-rules1",
"https://www.espn.com/f1/story/_/id/31263278/verstappen-calls-clarity-messy-track-limits-rules2",
"https://www.latimes.com/sports/story/2021-04-18/max-verstappen-lewis-hamilton-emilia-romagna-grand-prix1",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html1",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html2",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html4",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html6",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html8"
), Article_URL = c("IMESP1", "IMESP2", "IMESP3", "IMESP4", "IMESP5",
"IMESP6", "IMESP7", "IMESP8", "IMESP9", "IMESP10", "IMESP11",
"IMESP12", "IMESP13", "IMESP14", "IMESP15", "IMESP16", "IMESP17",
"IMESP18", "IMESP19", "IMESP20", "IMESP21", "IMESP22", "IMESP23",
"IMESP24", "IMESP25", "IMESP26", "IMESP27", "IMESP28", "IMESP29",
"IMESP30", "IMESP31", "IMESP32", "IMESP33", "IMESP34", "IMESP35",
"IMESP36", "IMESP37", "IMESP38", "IMESP39", "IMESP40", "IMESP41",
"IMESP42", "IMESP43", "IMESP44", "IMESP45", "IMESP62", "IMESP63",
"IMESP65", "IMESP67", "IMESP69"), img_URL = c("img21", "img22",
"img23", "img24", "img25", "img26", "img27", "img28", "img29",
"img210", "img211", "img212", "img213", "img214", "img215", "img216",
"img217", "img218", "img219", "img220", "img221", "img222", "img223",
"img224", "img225", "img226", "img227", "img228", "img229", "img230",
"img231", "img232", "img233", "img234", "img235", "img236", "img237",
"img238", "img239", "img240", "img241", "img242", "img243", "img244",
"img245", "img262", "img263", "img265", "img267", "img269")), row.names = c("https://www.espn.com/f1/story/_/id/31289705/the-blame-game-analysis-bottas-russell-clash-happens-next1",
"https://www.espn.com/f1/story/_/id/31289705/the-blame-game-analysis-bottas-russell-clash-happens-next2",
"https://www.espn.com/f1/story/_/id/31289705/the-blame-game-analysis-bottas-russell-clash-happens-next3",
"https://www.espn.com/f1/story/_/id/31287940/norris-made-step-says-mclaren1",
"https://www.espn.com/f1/story/_/id/31287893/vettel-calls-fia-not-very-professional-imola-penalty1",
"https://www.espn.com/f1/story/_/id/31284743/alonso-promoted-points-finish-raikkonen-penalty1",
"https://www.espn.com/f1/story/_/id/31284241/russell-lost-sight-bigger-picture-bottas-clash1",
"https://www.espn.com/f1/story/_/id/31284241/russell-lost-sight-bigger-picture-bottas-clash2",
"https://www.espn.com/f1/story/_/id/31284241/russell-lost-sight-bigger-picture-bottas-clash3",
"https://www.espn.com/f1/story/_/id/31283343/russell-asked-bottas-trying-kill-us-both1",
"http://www.espn.com/espn/wire?section=rpm&id=312812861", "http://www.espn.com/espn/wire?section=rpm&id=312812862",
"http://www.espn.com/espn/wire?section=rpm&id=312812863", "https://www.espn.com/f1/story/_/id/31281017/f1-confirms-race-miami-20221",
"https://www.espn.com/f1/story/_/id/31281017/f1-confirms-race-miami-20222",
"https://www.espn.com/f1/story/_/id/31281017/f1-confirms-race-miami-20223",
"https://www.espn.com/f1/story/_/id/31281017/f1-hold-miami-grand-prix-2022-onwards1",
"https://www.espn.com/f1/story/_/id/31281017/f1-hold-miami-grand-prix-2022-onwards2",
"https://www.espn.com/f1/story/_/id/31281017/f1-hold-miami-grand-prix-2022-onwards3",
"https://www.espn.com/f1/story/_/id/31275927/how-hamilton-mercedes-turned-tables-red-bull1",
"https://www.espn.com/f1/story/_/id/31275927/how-hamilton-mercedes-turned-tables-red-bull2",
"https://www.espn.com/f1/story/_/id/31275927/how-hamilton-mercedes-turned-tables-red-bull3",
"http://www.espn.com/espn/wire?section=rpm&id=312742341", "http://www.espn.com/espn/wire?section=rpm&id=312742342",
"http://www.espn.com/espn/wire?section=rpm&id=312742343", "https://www.espn.com/f1/story/_/id/31270294/has-mercedes-regained-advantage-already-ferrari-spring-surprise1",
"https://www.espn.com/f1/story/_/id/31270294/has-mercedes-regained-advantage-already-ferrari-spring-surprise2",
"https://www.espn.com/f1/story/_/id/31270374/aston-martin-wants-aero-rules-revision1",
"https://www.espn.com/f1/story/_/id/31270374/aston-martin-wants-aero-rules-revision2",
"https://www.espn.com/f1/story/_/id/31270374/aston-martin-wants-aero-rules-revision3",
"http://www.espn.com/espn/wire?section=rpm&id=312676841", "http://www.espn.com/espn/wire?section=rpm&id=312676842",
"http://www.espn.com/espn/wire?section=rpm&id=312676843", "https://www.espn.com/f1/story/_/id/31263611/hamilton-says-f1-rivalry-vettel-remains-favourite1",
"https://www.espn.com/f1/story/_/id/31263611/hamilton-says-f1-rivalry-vettel-remains-favourite2",
"https://www.espn.com/f1/story/_/id/31263611/hamilton-says-f1-rivalry-vettel-remains-favourite3",
"http://www.espn.com/espn/wire?section=rpm&id=312634771", "http://www.espn.com/espn/wire?section=rpm&id=312634772",
"http://www.espn.com/espn/wire?section=rpm&id=312634773", "https://www.espn.com/f1/story/_/id/31263349/mercedes-gone-hunted-hunters1",
"https://www.espn.com/f1/story/_/id/31263349/mercedes-gone-hunted-hunters2",
"https://www.espn.com/f1/story/_/id/31263349/mercedes-gone-hunted-hunters3",
"https://www.espn.com/f1/story/_/id/31263278/verstappen-calls-clarity-messy-track-limits-rules1",
"https://www.espn.com/f1/story/_/id/31263278/verstappen-calls-clarity-messy-track-limits-rules2",
"https://www.latimes.com/sports/story/2021-04-18/max-verstappen-lewis-hamilton-emilia-romagna-grand-prix1",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html1",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html2",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html4",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html6",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html8"
), class = "data.frame")
图像位于不同的位置。你可以试试下面的代码-
library(rvest)
lapply(URLs, function(x) {
x %>%
read_html() %>%
html_nodes("picture source") %>%
html_attr("data-srcset") %>%
strsplit(',') %>%
.[[1]] %>%
na.omit %>%
trimws %>%
.[1] -> img
if(!is.na(img)) download.file(img, paste0('photo', Sys.time(), '.jpeg'))
})
我有一个向量 URLs
library(rvest)
URLs <-c("https://www.espn.com/f1/story/_/id/31287940/norris-made-step-says-mclaren",
"https://www.espn.com/f1/story/_/id/31287893/vettel-calls-fia-not-very-professional-imola-penalty",
"https://www.espn.com/f1/story/_/id/31284743/alonso-promoted-points-finish-raikkonen-penalty")
我想遍历这些并为该页面上的所有图片创建一个图像列表 link,着陆页是列表中元素的名称。然后,我希望下载图片时同时附上图片和着陆页 urls。
我目前的代码一次只能用于一页。
REPREX 的单个 URL
url <-c("https://www.espn.com/f1/story/_/id/31287940/norris-made-step-says-mclaren")
webpage <- html_session(url)
link.titles <- webpage %>% html_nodes("img")
img.url <- link.titles[2] %>% html_attr("src")
### Issue #1 i could not figure out the loop with html_attr to spit out all of the urls in a list###
download.file(img.url, "test.jpg", mode = "wb")
###Issue #2 because of this I cannot loop through a list and download the names###
一旦我能够下载图片列表,url 问题就可以轻松解决,方法是用定界符分隔着陆页和图片 url 来命名文件。我可以为每个 URL 创建一个 ID 以减少文件长度。命名约定,“LP1-ESPN.com.jpg”
这样做的目的是快速浏览每个link的照片,删除不相关的照片,统计每个link的照片数量,然后link统计(在手动删除不相关的照片后)和 links 返回到原始数据集,该数据集具有其他用于分析的指标。这就是为什么我想要上面的命名约定的原因,这样我就可以从 r 中的文件夹中加载剩余的名称和 links,而无需操作 jpg 文件。
编辑:我已经能够获得我所有 url 的列表以及其中的图像 link。我无法通过此循环下载它们。很多都丢失了。 以下代码有效,但只下载了大约 10% 的图像。我知道 html_session 可以解决这个问题,但是大约有 2500 张图像 link,我无法弄清楚如何循环处理会话。也许是一个 while 循环?
tryCatch(lapply(1:length(total_urls.2$V1), function(x)
download.file(new_df[[x]],paste0(total_urls.2[x,3],"_", total_urls.2[x,4],".jpeg"),method = "auto" ,mode = "wb", cacheOK = FALSE)), error = function(e) NULL)
这是有问题的 data.frame:它有 2380 行长,但这里被截断为 50 行。我想知道如何下载所有图片 links,现在我在文件夹中只有大约 19 张图片。
dput(total_urls.2[1:50,])
structure(list(V1 = c("https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/i/columnists/edmondson_laurence_m.jpg&h=80&w=80&scale=crop",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0418/r842329_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/i/columnists/edmondson_laurence_m.jpg&h=80&w=80&scale=crop",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0418/r842215_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/icons/in_15.png", "https://a.espncdn.com/icons/in_15.png",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0417/r841759_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0331/r834376_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0417/r841759_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0331/r834376_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/i/columnists/edmondson_laurence_m.jpg&h=80&w=80&scale=crop",
"https://a.espncdn.com/combiner/i?img=/i/columnists/saunders_nate_m.jpg&h=80&w=80&scale=crop",
"https://a.espncdn.com/icons/in_15.png", "https://a.espncdn.com/icons/in_15.png",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/i/columnists/edmondson_laurence_m.jpg&h=80&w=80&scale=crop",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0329/r833584_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0415/r840887_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/icons/in_15.png", "https://a.espncdn.com/icons/in_15.png",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0415/r840882_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0415/r840887_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/icons/in_15.png", "https://a.espncdn.com/icons/in_15.png",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0413/r839738_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0415/r840882_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://a3.espncdn.com/combiner/i?img=%2Fredesign%2Fassets%2Fimg%2Ficons%2FESPN%2Dicon%2Dnascar.png&w=80&h=80&scale=crop&cquality=40&location=origin",
"https://a.espncdn.com/combiner/i?img=/photo/2021/0329/r833489_1296x1296_1-1.jpg&w=130&h=130&scale=crop&location=center",
"https://ca-times.brightspotcdn.com/dims4/default/331b349/2147483647/strip/true/crop/3937x2625+0+0/resize/840x560!/quality/90/?url=https%3A%2F%2Fcalifornia-times-brightspot.s3.amazonaws.com%2Fb3%2F61%2F0c10c4344ce0a69870cf8864cf98%2Fitaly-emilia-romagna-f1-gp-auto-racing-58053.jpg",
"//cdn.cnn.com/cnnnext/dam/assets/200809102318-verstappen-celeb-exlarge-169.jpg",
"//cdn.cnn.com/cnnnext/dam/assets/200809102318-verstappen-celeb-exlarge-169.jpg",
"//cdn.cnn.com/cnnnext/dam/assets/200809102318-verstappen-celeb-large-169.jpg",
"//cdn.cnn.com/cnnnext/dam/assets/210309145427-mick-and-michael-schumacher-large-169.jpg",
"//cdn.cnn.com/cnnnext/dam/assets/210305121558-stephanie-travers-lewis-hamilton-large-169.jpg"
), URLs = c("https://www.espn.com/f1/story/_/id/31289705/the-blame-game-analysis-bottas-russell-clash-happens-next1",
"https://www.espn.com/f1/story/_/id/31289705/the-blame-game-analysis-bottas-russell-clash-happens-next2",
"https://www.espn.com/f1/story/_/id/31289705/the-blame-game-analysis-bottas-russell-clash-happens-next3",
"https://www.espn.com/f1/story/_/id/31287940/norris-made-step-says-mclaren1",
"https://www.espn.com/f1/story/_/id/31287893/vettel-calls-fia-not-very-professional-imola-penalty1",
"https://www.espn.com/f1/story/_/id/31284743/alonso-promoted-points-finish-raikkonen-penalty1",
"https://www.espn.com/f1/story/_/id/31284241/russell-lost-sight-bigger-picture-bottas-clash1",
"https://www.espn.com/f1/story/_/id/31284241/russell-lost-sight-bigger-picture-bottas-clash2",
"https://www.espn.com/f1/story/_/id/31284241/russell-lost-sight-bigger-picture-bottas-clash3",
"https://www.espn.com/f1/story/_/id/31283343/russell-asked-bottas-trying-kill-us-both1",
"http://www.espn.com/espn/wire?section=rpm&id=312812861", "http://www.espn.com/espn/wire?section=rpm&id=312812862",
"http://www.espn.com/espn/wire?section=rpm&id=312812863", "https://www.espn.com/f1/story/_/id/31281017/f1-confirms-race-miami-20221",
"https://www.espn.com/f1/story/_/id/31281017/f1-confirms-race-miami-20222",
"https://www.espn.com/f1/story/_/id/31281017/f1-confirms-race-miami-20223",
"https://www.espn.com/f1/story/_/id/31281017/f1-hold-miami-grand-prix-2022-onwards1",
"https://www.espn.com/f1/story/_/id/31281017/f1-hold-miami-grand-prix-2022-onwards2",
"https://www.espn.com/f1/story/_/id/31281017/f1-hold-miami-grand-prix-2022-onwards3",
"https://www.espn.com/f1/story/_/id/31275927/how-hamilton-mercedes-turned-tables-red-bull1",
"https://www.espn.com/f1/story/_/id/31275927/how-hamilton-mercedes-turned-tables-red-bull2",
"https://www.espn.com/f1/story/_/id/31275927/how-hamilton-mercedes-turned-tables-red-bull3",
"http://www.espn.com/espn/wire?section=rpm&id=312742341", "http://www.espn.com/espn/wire?section=rpm&id=312742342",
"http://www.espn.com/espn/wire?section=rpm&id=312742343", "https://www.espn.com/f1/story/_/id/31270294/has-mercedes-regained-advantage-already-ferrari-spring-surprise1",
"https://www.espn.com/f1/story/_/id/31270294/has-mercedes-regained-advantage-already-ferrari-spring-surprise2",
"https://www.espn.com/f1/story/_/id/31270374/aston-martin-wants-aero-rules-revision1",
"https://www.espn.com/f1/story/_/id/31270374/aston-martin-wants-aero-rules-revision2",
"https://www.espn.com/f1/story/_/id/31270374/aston-martin-wants-aero-rules-revision3",
"http://www.espn.com/espn/wire?section=rpm&id=312676841", "http://www.espn.com/espn/wire?section=rpm&id=312676842",
"http://www.espn.com/espn/wire?section=rpm&id=312676843", "https://www.espn.com/f1/story/_/id/31263611/hamilton-says-f1-rivalry-vettel-remains-favourite1",
"https://www.espn.com/f1/story/_/id/31263611/hamilton-says-f1-rivalry-vettel-remains-favourite2",
"https://www.espn.com/f1/story/_/id/31263611/hamilton-says-f1-rivalry-vettel-remains-favourite3",
"http://www.espn.com/espn/wire?section=rpm&id=312634771", "http://www.espn.com/espn/wire?section=rpm&id=312634772",
"http://www.espn.com/espn/wire?section=rpm&id=312634773", "https://www.espn.com/f1/story/_/id/31263349/mercedes-gone-hunted-hunters1",
"https://www.espn.com/f1/story/_/id/31263349/mercedes-gone-hunted-hunters2",
"https://www.espn.com/f1/story/_/id/31263349/mercedes-gone-hunted-hunters3",
"https://www.espn.com/f1/story/_/id/31263278/verstappen-calls-clarity-messy-track-limits-rules1",
"https://www.espn.com/f1/story/_/id/31263278/verstappen-calls-clarity-messy-track-limits-rules2",
"https://www.latimes.com/sports/story/2021-04-18/max-verstappen-lewis-hamilton-emilia-romagna-grand-prix1",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html1",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html2",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html4",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html6",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html8"
), Article_URL = c("IMESP1", "IMESP2", "IMESP3", "IMESP4", "IMESP5",
"IMESP6", "IMESP7", "IMESP8", "IMESP9", "IMESP10", "IMESP11",
"IMESP12", "IMESP13", "IMESP14", "IMESP15", "IMESP16", "IMESP17",
"IMESP18", "IMESP19", "IMESP20", "IMESP21", "IMESP22", "IMESP23",
"IMESP24", "IMESP25", "IMESP26", "IMESP27", "IMESP28", "IMESP29",
"IMESP30", "IMESP31", "IMESP32", "IMESP33", "IMESP34", "IMESP35",
"IMESP36", "IMESP37", "IMESP38", "IMESP39", "IMESP40", "IMESP41",
"IMESP42", "IMESP43", "IMESP44", "IMESP45", "IMESP62", "IMESP63",
"IMESP65", "IMESP67", "IMESP69"), img_URL = c("img21", "img22",
"img23", "img24", "img25", "img26", "img27", "img28", "img29",
"img210", "img211", "img212", "img213", "img214", "img215", "img216",
"img217", "img218", "img219", "img220", "img221", "img222", "img223",
"img224", "img225", "img226", "img227", "img228", "img229", "img230",
"img231", "img232", "img233", "img234", "img235", "img236", "img237",
"img238", "img239", "img240", "img241", "img242", "img243", "img244",
"img245", "img262", "img263", "img265", "img267", "img269")), row.names = c("https://www.espn.com/f1/story/_/id/31289705/the-blame-game-analysis-bottas-russell-clash-happens-next1",
"https://www.espn.com/f1/story/_/id/31289705/the-blame-game-analysis-bottas-russell-clash-happens-next2",
"https://www.espn.com/f1/story/_/id/31289705/the-blame-game-analysis-bottas-russell-clash-happens-next3",
"https://www.espn.com/f1/story/_/id/31287940/norris-made-step-says-mclaren1",
"https://www.espn.com/f1/story/_/id/31287893/vettel-calls-fia-not-very-professional-imola-penalty1",
"https://www.espn.com/f1/story/_/id/31284743/alonso-promoted-points-finish-raikkonen-penalty1",
"https://www.espn.com/f1/story/_/id/31284241/russell-lost-sight-bigger-picture-bottas-clash1",
"https://www.espn.com/f1/story/_/id/31284241/russell-lost-sight-bigger-picture-bottas-clash2",
"https://www.espn.com/f1/story/_/id/31284241/russell-lost-sight-bigger-picture-bottas-clash3",
"https://www.espn.com/f1/story/_/id/31283343/russell-asked-bottas-trying-kill-us-both1",
"http://www.espn.com/espn/wire?section=rpm&id=312812861", "http://www.espn.com/espn/wire?section=rpm&id=312812862",
"http://www.espn.com/espn/wire?section=rpm&id=312812863", "https://www.espn.com/f1/story/_/id/31281017/f1-confirms-race-miami-20221",
"https://www.espn.com/f1/story/_/id/31281017/f1-confirms-race-miami-20222",
"https://www.espn.com/f1/story/_/id/31281017/f1-confirms-race-miami-20223",
"https://www.espn.com/f1/story/_/id/31281017/f1-hold-miami-grand-prix-2022-onwards1",
"https://www.espn.com/f1/story/_/id/31281017/f1-hold-miami-grand-prix-2022-onwards2",
"https://www.espn.com/f1/story/_/id/31281017/f1-hold-miami-grand-prix-2022-onwards3",
"https://www.espn.com/f1/story/_/id/31275927/how-hamilton-mercedes-turned-tables-red-bull1",
"https://www.espn.com/f1/story/_/id/31275927/how-hamilton-mercedes-turned-tables-red-bull2",
"https://www.espn.com/f1/story/_/id/31275927/how-hamilton-mercedes-turned-tables-red-bull3",
"http://www.espn.com/espn/wire?section=rpm&id=312742341", "http://www.espn.com/espn/wire?section=rpm&id=312742342",
"http://www.espn.com/espn/wire?section=rpm&id=312742343", "https://www.espn.com/f1/story/_/id/31270294/has-mercedes-regained-advantage-already-ferrari-spring-surprise1",
"https://www.espn.com/f1/story/_/id/31270294/has-mercedes-regained-advantage-already-ferrari-spring-surprise2",
"https://www.espn.com/f1/story/_/id/31270374/aston-martin-wants-aero-rules-revision1",
"https://www.espn.com/f1/story/_/id/31270374/aston-martin-wants-aero-rules-revision2",
"https://www.espn.com/f1/story/_/id/31270374/aston-martin-wants-aero-rules-revision3",
"http://www.espn.com/espn/wire?section=rpm&id=312676841", "http://www.espn.com/espn/wire?section=rpm&id=312676842",
"http://www.espn.com/espn/wire?section=rpm&id=312676843", "https://www.espn.com/f1/story/_/id/31263611/hamilton-says-f1-rivalry-vettel-remains-favourite1",
"https://www.espn.com/f1/story/_/id/31263611/hamilton-says-f1-rivalry-vettel-remains-favourite2",
"https://www.espn.com/f1/story/_/id/31263611/hamilton-says-f1-rivalry-vettel-remains-favourite3",
"http://www.espn.com/espn/wire?section=rpm&id=312634771", "http://www.espn.com/espn/wire?section=rpm&id=312634772",
"http://www.espn.com/espn/wire?section=rpm&id=312634773", "https://www.espn.com/f1/story/_/id/31263349/mercedes-gone-hunted-hunters1",
"https://www.espn.com/f1/story/_/id/31263349/mercedes-gone-hunted-hunters2",
"https://www.espn.com/f1/story/_/id/31263349/mercedes-gone-hunted-hunters3",
"https://www.espn.com/f1/story/_/id/31263278/verstappen-calls-clarity-messy-track-limits-rules1",
"https://www.espn.com/f1/story/_/id/31263278/verstappen-calls-clarity-messy-track-limits-rules2",
"https://www.latimes.com/sports/story/2021-04-18/max-verstappen-lewis-hamilton-emilia-romagna-grand-prix1",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html1",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html2",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html4",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html6",
"https://edition.cnn.com/2021/04/18/motorsport/max-verstappen-lewis-hamilton-imola-gp-spt-intl/index.html8"
), class = "data.frame")
图像位于不同的位置。你可以试试下面的代码-
library(rvest)
lapply(URLs, function(x) {
x %>%
read_html() %>%
html_nodes("picture source") %>%
html_attr("data-srcset") %>%
strsplit(',') %>%
.[[1]] %>%
na.omit %>%
trimws %>%
.[1] -> img
if(!is.na(img)) download.file(img, paste0('photo', Sys.time(), '.jpeg'))
})