为什么 matcher.find() 没有给出任何结果?为什么会结冰?

Why matcher.find() is not giving any result? Why it freezes?

我正在创建电子邮件抓取工具。但是当我尝试使用一个特定的 URL matcher.find() 时,没有给出任何 boolean 结果。如我所见,它冻结了。但对于其他一些 URLs,代码工作正常。

这是我的代码,

private Matcher matcher;
private Pattern pattern = null;
private final String emailPattern = "([\w\-]([\.\w])+[\w]+@([\w\-]+\.)+[A-Za-z]{2,4})";

public void scrape() {
   pattern = Pattern.compile(emailPattern);

   Document documentTwo = null;

   try {
      documentTwo = Jsoup.connect("https://www.mercurynews.com/2020/03/21/how-can-i-get-tested-for-covid-19-in-the-bay-area/")
              .ignoreHttpErrors(true)
              .userAgent(RandomUserAgent.getRandomUserAgent())
              .header("Content-Language", "en-US")
              .get();
   } catch (IOException ex) {
     break;
   }

   String pageBody = documentTwo.toString();

   matcher = pattern.matcher(pageBody);

   while (matcher.find()) {
      // this will never execute for the above web address
   }
}

为了检查,我在 while 循环上方添加了 System.out.println(matcher.find());,它卡在那里没有打印任何值。那么我在这里做错了什么?我尝试过许多不同的电子邮件正则表达式模式,但上面的模式是有效的。

您的正则表达式有问题。下面给出的是带有工作正则表达式的代码:

import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class Main {
    public static void main(String[] args) {
        Document documentTwo = null;
        try {
            documentTwo = Jsoup
                    .connect(
                            "https://www.mercurynews.com/2020/03/21/how-can-i-get-tested-for-covid-19-in-the-bay-area/")
                    .header("Content-Language", "en-US").get();
        } catch (IOException e) {
            e.printStackTrace();
        }

        String pageBody = documentTwo.toString();
        Pattern pattern = Pattern.compile(
                "([a-zA-Z0-9\+\.\_\%\-\+]{1,256}\@[a-zA-Z0-9][a-zA-Z0-9\-]{0,64}(\.[a-zA-Z0-9][a-zA-Z0-9\-]{0,25})+)");
        Matcher matcher = pattern.matcher(pageBody);
        while (matcher.find()) {
            System.out.println(matcher.group());
        }
    }
}

输出:

lkrieger@bayareanewsgroup.com
lkrieger@bayareanewsgroup.com
fkelliher@bayareanewsgroup.com
lkrieger@bayareanewsgroup.com
lkrieger@bayareanewsgroup.com
fkelliher@bayareanewsgroup.com
fkelliher@bayareanewsgroup.com
lkrieger@bayareanewsgroup.com
lkrieger@bayareanewsgroup.com
fkelliher@bayareanewsgroup.com
fkelliher@bayareanewsgroup.com
lkrieger@bayareanewsgroup.com