一般 URL 格式

Question

我正在尝试解决这个从大文本中提取 URL 的问题，我一直在寻找 URL 的通用格式，直到我找到这个 https://en.wikipedia.org/wiki/URL; 并且我做了这个代码，但我不知道为什么没有找到 URL:

 Pattern p = Pattern.compile("(http|https|ftp|mailto|file|data|irc|rtsp)(\:)(^\w{1})([a-zA-Z0-9/%+.-]*$)\.(com|net|org|jo)\/(.+)" , Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(text);
       if(m.matches())
             System.out.println(text) ;
       else
             System.out.println("no matches");

Answer 1

对我来说这个正则表达式有效：

(http|https|ftp|mailto|file|data|irc|rtsp)(\:)(\/\/)([a-zA-Z0-9\/%+.-\/]*)\.(com|net|org|jo)\/(\w*\/)*(\w+)

如果您想要此处 URL 的最后一部分即第 7 组

，则必须捕获最后一组

希望对您有所帮助

一般 URL 格式

general URL format

java

regex

url

netbeans-8