URL R 中的语义分析
URL semantics analysis in R
我有一个包含各种 url 的数据集。
https://www.thetrainline.com/buytickets/combinedmatrix.aspx?Command=TimeTable
https://wwf-fb.zyngawithfriends.com/wwf-fb.a84485c126e67ea2787c.html
http://www.thetrainline.com/destinations/trains-to-london
我想对url(/后的URL中的关键字)进行语义分析。
请帮帮我。
谢谢
URLs1 <- c('http://www.thetrainline.com/destinations/trains-to-london', 'https://wwf-fb.zyngawithfriends.com/wwf-fb.a84485c126e67ea2787c.html', 'https://www.thetrainline.com/buytickets/combinedmatrix.aspx?Command=TimeTable')
> gsub('^(?:[^/]*/){3}','/', URLs1)
[1] "/destinations/trains-to-london" "/wwf-fb.a84485c126e67ea2787c.html"
[3] "/buytickets/combinedmatrix.aspx?Command=TimeTable"
>
这比您手动完成要快得多,也更全面。
library(urltools)
URLs <- c("https://www.thetrainline.com/buytickets/combinedmatrix.aspx?Command=TimeTable",
"https://wwf-fb.zyngawithfriends.com/wwf-fb.a84485c126e67ea2787c.html",
"https:/test.com/thing.php?a=1&b=2",
"http://www.thetrainline.com/destinations/trains-to-london")
url_parse(URLs)
## scheme domain port path parameter fragment
## 1 https www.thetrainline.com buytickets/combinedmatrix.aspx command=timetable
## 2 https wwf-fb.zyngawithfriends.com wwf-fb.a84485c126e67ea2787c.html
## 3 https test.com/thing.php a=1&b=2
## 4 http www.thetrainline.com destinations/trains-to-london
我有一个包含各种 url 的数据集。
https://www.thetrainline.com/buytickets/combinedmatrix.aspx?Command=TimeTable
https://wwf-fb.zyngawithfriends.com/wwf-fb.a84485c126e67ea2787c.html
http://www.thetrainline.com/destinations/trains-to-london
我想对url(/后的URL中的关键字)进行语义分析。
请帮帮我。
谢谢
URLs1 <- c('http://www.thetrainline.com/destinations/trains-to-london', 'https://wwf-fb.zyngawithfriends.com/wwf-fb.a84485c126e67ea2787c.html', 'https://www.thetrainline.com/buytickets/combinedmatrix.aspx?Command=TimeTable')
> gsub('^(?:[^/]*/){3}','/', URLs1)
[1] "/destinations/trains-to-london" "/wwf-fb.a84485c126e67ea2787c.html"
[3] "/buytickets/combinedmatrix.aspx?Command=TimeTable"
>
这比您手动完成要快得多,也更全面。
library(urltools)
URLs <- c("https://www.thetrainline.com/buytickets/combinedmatrix.aspx?Command=TimeTable",
"https://wwf-fb.zyngawithfriends.com/wwf-fb.a84485c126e67ea2787c.html",
"https:/test.com/thing.php?a=1&b=2",
"http://www.thetrainline.com/destinations/trains-to-london")
url_parse(URLs)
## scheme domain port path parameter fragment
## 1 https www.thetrainline.com buytickets/combinedmatrix.aspx command=timetable
## 2 https wwf-fb.zyngawithfriends.com wwf-fb.a84485c126e67ea2787c.html
## 3 https test.com/thing.php a=1&b=2
## 4 http www.thetrainline.com destinations/trains-to-london