HtmlUnit WebClient 重置线程中断状态

HtmlUnit WebClient resets thread interruption status

我有一堆解析器 classes 子 class 一个 PriseParser class 并实现一个 getAllPrices() 方法(由 PriseParser.getPrices() 调用为了从各种网站获取一些数据,做了一些与此无关的其他事情 post)。以下是此类实现的示例:

@Override
public List<Price> getAllPrices() throws ParserException,
        InterruptedException {

    LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log",
            "org.apache.commons.logging.impl.NoOpLog");

    java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit")
            .setLevel(Level.OFF);
    java.util.logging.Logger.getLogger("org.apache.commons.httpclient")
            .setLevel(Level.OFF);

    List<Price> prices = new ArrayList<price>();

    WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);

    HtmlPage page;
    try {
        page = webClient.getPage(URL);

        if(Thread.currentThread().isInterrupted()){

            System.out.println("INTERRUPTED BEFORE CLOSE");
        }

        //my parsing code here that fills prices list. Includes calls to webClient.waitForBackgroundJavaScript in some places


        webClient.closeAllWindows();

        if(Thread.currentThread().isInterrupted()){

            System.out.println("INTERRUPTED AFTER CLOSE");
        }

    } catch (InterruptedException e) {
        throw e;
    } catch (Exception e) {
        throw new ParserException(e);
    }

    return prices;
}

这些解析器 运行 与 ExecutorService 同时存在:

public List<Price> getPrices(List<PriceParser> priceParsers) throws InterruptedException {

    ExecutorService executorService = Executors
            .newFixedThreadPool(PriceParsers.size());

    Set<Callable<List<Price>>> callables = new HashSet<Callable<List<Price>>>();

    List<Price> allPrices = new ArrayList<Price>();

    for (PriceParser PriceParser : PriceParsers) {

        callables.add(new Callable<List<Price>>() {
            public List<Price> call() throws Exception {

                List<Price> prices = new ArrayList<Price>();

                prices = PriceParser.getPrices();

                return prices;
            }
        });
    }

    List<Future<List<Price>>> futures;

    try {
        futures = executorService.invokeAll(callables);

        for (Future<List<Price>> future : futures) {

            allPrices.addAll(future.get());
        }
    } catch (InterruptedException e) {

        throw e;

    } catch (ExecutionException e) {

        logger.error("MULTI-THREADING EXECUTION ERROR ", e);
        throw new RuntimeException("MULTI-THREADING EXECUTION ERROR ", e);

    } finally {

        executorService.shutdownNow();
    }

    return allPrices;
}

添加第一种方法中的两 if(Thread.currentThread().isInterrupted()){} 段代码是为了检查我观察到的以下问题:当执行程序服务中断时(这可能发生在 gui 应用程序终止按下取消按钮时的线程),我在代码中插入的第一个中断检查成功打印 "INTERRUPTED BEFORE CLOSE".

但是第二张支票没有打印任何东西。因此,似乎以某种方式我对 webClient 进行的调用之一(这是 waitForBackgroundJavaScript 方法调用和最后的 webClient.closeAllWindows() 调用)清除了线程中断状态。有人可以解释为什么会这样吗?

看来问题出在我的解析代码对 webClient.waitForBackgroundJavaScript 的调用上。在内部,这一直延伸到 HtmlUnit 的 JavaScriptJobManagerImpl 方法的 waitForJobs 方法。此方法包含以下代码段,它基本上吞没了所有 InterruptedExceptions,因此任何调用者都能够识别在该调用期间是否进行了任何中断:

                try {
                    synchronized (this) {
                        wait(end - now);
                    }

                    // maybe a change triggers the wakup; we have to recalculate the
                    // wait time
                    now = System.currentTimeMillis();
                }
                catch (final InterruptedException e) {
                    LOG.error("InterruptedException while in waitForJobs", e);
                }

与其捕获并记录日志,不如抛出异常