虽然通过改造请求整个页面作为字符串 returns 没有完全解析页面内容

While request whole page via retrofit as string it returns not fully parsed page content

想要从 android 应用程序中的 azure 解析整个页面。 现在,当我尝试解析 html 页面时,在响应正文中我只收到部分解析的正文。似乎解析器没有深入到某些标签。

android函数的代码

private void retrofit2_1() {

        Interceptor interceptor = new Interceptor() {
            @Override
            public okhttp3.Response intercept(Chain chain) throws IOException {
                okhttp3.Request original = chain.request();

                String login = "login";
                String pass = "pass";

                String authToken = Credentials.basic(login, pass);

                okhttp3.Request request = original.newBuilder()
                        .header("Authorization", authToken)
                        .method(original.method(), original.body())
                        .build();

                return chain.proceed(request);
            }
        };

        OkHttpClient okHttpClient = new OkHttpClient.Builder().addInterceptor(interceptor).build();

        Retrofit retrofit = new Retrofit.Builder()
                .addConverterFactory(ScalarsConverterFactory.create())
                .baseUrl("website")
                .client(okHttpClient)
                .build();

        ScalarService scalarService = retrofit.create(ScalarService.class);
        Call<String> stringCall = scalarService.getStringResponse("website");
        stringCall.enqueue(new Callback<String>() {
            @Override
            public void onResponse(Call<String> call, retrofit2.Response<String> response) {
                if (response.isSuccessful()) {
                    String responseString = response.body();
                    System.out.println(responseString);
                }
            }

            @Override
            public void onFailure(Call<String> call, Throwable t) {
                System.out.println(t.getMessage());
            }
        });
    }

    interface ScalarService {
        @GET()
        Call<String> getStringResponse(@Url String url);
    }

响应体代码 在里面,你可以看到标签,这是最后解析的东西 所以它确实进入了

<!DOCTYPE html>

<html lang="en">
<head>

    <base href="/" />

    <title>Loading...</title>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <meta name="description" content="">
    <meta name="author" content="">

    <link href="/Content/npm.css/bootstrap/dist/css/bootstrap.min.css" rel="stylesheet">
    <link href="/Content/site.css/sb-admin.css" rel="stylesheet" />
    <link href="/Content/npm.css/font-awesome/css/font-awesome.min.css" rel="stylesheet" type="text/css">
    <link href="/Content/site.css/custom.css" rel="stylesheet">


    <script src="/Scripts/npm.js/core-js/client/shim.min.js"></script>
    <script src="/Scripts/npm.js/zone.js/dist/zone.min.js"></script>
    <script src="/Scripts/npm.js/systemjs/dist/system.js"></script>
    <script src="/Scripts/site.js/systemjs.config.js"></script>

    <!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries -->
    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
    <!--[if lt IE 9]>
        <script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
        <script src="https://oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js"></script>
    <![endif]-->


</head>
<body>
    <div id="wrapper">
        <!-- Navigation -->
        <nav class="navbar navbar-inverse navbar-fixed-top" role="navigation">
            <!-- Brand and toggle get grouped for better mobile display -->
            <div class="navbar-header">
                <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse">
                    <span class="sr-only">Toggle navigation</span>
                    <span class="icon-bar"></span>
                    <span class="icon-bar"></span>
                    <span class="icon-bar"></span>
                </button>
                <a class="navbar-brand" href="/">Title</a>

            </div>
            <!-- Top Menu Items -->
            <ul class="nav navbar-right top-nav" >
                <li class="dropdown" userinfo>
                </li>
            </ul>
            <!-- Sidebar Menu Items - These collapse to the responsive navigation menu on small screens -->
            <div id="cl" class="collapse navbar-collapse navbar-ex1-collapse">
                <sidebar></sidebar>
            </div>
            <!-- /.navbar-collapse -->
        </nav>

        <div id="page-wrapper">
            <div class="container-fluid">
                <div class="row">
                    <div class="col-md-12">
                        <pagecontent></pagecontent>
                    </div>
                </div>
            </div>
        </div>

    </div>
    <script>
        System.import("bootstrap");
        System.import("jsplumb");
        System.import("app");
    </script>
</body>
</html>

我希望响应正文中的页面已完全解析 也许,我必须更改一些参数来控制解析器的深度 走 但是没有在文档中找到任何 我发现一件事,如果内容太大,它应该抛出异常,但我没有遇到任何这样的异常

1。 假设您可以尝试使用 OkHttp3 配置。 尝试增加超时。这里是documentation。 默认情况下,读取超时为 10 秒。

2。 另一个假设是您在页面完全加载之前获取页面源。至少 <title>Loading...</title> 看起来真的很可疑。 在那种情况下,将正文读取为缓冲流应该会有所帮助。查看 ResponseBody 源代码以获取详细信息。

P.S。一般来说,如果您可以提供更多详细信息 - 正在使用什么解析器,那将会很有帮助?比如Jsoup有下载限制,可以提高。