复制文件内容，同时忽略 Java 中 < 和 > 之间的字符

Question

我想编写一个程序，从 html 文件中读取并复制内容，但它会忽略 html 标签而不使用 replaceAll。此外，html 标签的剥离必须以不同的方法完成。该文件如下所示：

 <html>
 <head>
 <title>My web page</title>
 </head>
 <body>
 <p>There are many pictures of my cat here,
 as well as my <b>very cool</b> blog page,
 which contains <font color="red">awesome
 stuff about my trip to Vegas.</p>


 Here's my cat now:<img src="cat.jpg">
 </body>
 </html>

我希望我的程序显示以下内容：

 My web page


 There are many pictures of my cat here,
 as well as my very cool blog page,
 which contains awesome
 stuff about my trip to Vegas.


 Here's my cat now:

Answer 1

public static void main(String[] args) {
    String html = " <html>\n"
            + " <head>\n"
            + " <title>My web page</title>\n"
            + " </head>\n"
            + " <body>\n"
            + " <p>There are many pictures of my cat here,\n"
            + " as well as my <b>very cool</b> blog page,\n"
            + " which contains <font color=\"red\">awesome\n"
            + " stuff about my trip to Vegas.</p>\n"
            + "\n"
            + "\n"
            + " Here's my cat now:<img src=\"cat.jpg\">\n"
            + " </body>\n"
            + " </html>";

    boolean inTag = false;
    StringBuilder finalString = new StringBuilder();

    int length = html.length();
    for (int i = 0; i < length; i++) {

        char c = html.charAt(i);

        if ('<' == c) {
            inTag = true;
        } else if ('>' == c) {
            inTag = false;
        } else if (!inTag) {
            finalString.append(c);
        }

    }

    System.out.print(finalString);

}

复制文件内容，同时忽略 Java 中 < 和 > 之间的字符

Copy a file's contents while ignoring characters between < and > in Java

strip-tags