如何从字符串中获取文本计数

How to get the text count from String

我有以下字符串

   Salary and Benefits <span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span>
Job Security <span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span>
Career Growth <span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barnone"></span>
Work Environment <span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span>
CEO Rating <span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span>

我需要像下面的格式显示计数("read-barfull" 计数)

Salary and Benefits 5
Job Security 5
Career Growth 4
Work Environment 5
CEO Rating 5 

请帮我获取格式 提前谢谢你

如果您要计算的 "token" 字符串是静态的(或至少 "predefined"),您可以执行如下操作,它使用 Apache commons-lang:

String str = "Salary and Benefits <span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span>";
String spanText = "<span class=\"read-barfull\"></span>";
int count = StringUtils.countMatches(str, spanText);

将您的逻辑分为两种方式

  1. 创建 List<String>
  2. 迭代列表并使用字符串缓冲区或拆分搜索单词并获得计数器增量。 {做简单的逻辑来分离 "read-barfull" 和字符串 "key"(ie.Salary 和 Benefits)
  3. 从中获取计数值。
  4. 创建 Map<String,Integer> 仅此而已。

以下是使用 Jsoup 的方法(因为您的问题已用它标记)。总体思路是

  • 逐行阅读HTML,
  • 获取HTML这一行表示的文本
  • select 所有 <span class="read-barfull"></span> 元素(不管它们是否为空,但您可以根据需要更改它)-简单 select("span.read-barfull") 将为我们完成此操作
  • 打印 selected span 个元素的计数(size() 在这里很有用)

代码:

String html = "Salary and Benefits <span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span>\r\n" + 
        "Job Security <span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span>\r\n" + 
        "Career Growth <span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barnone\"></span>\r\n" + 
        "Work Environment <span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span>\r\n" + 
        "CEO Rating <span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span>";

Scanner sc = new Scanner(html);
while(sc.hasNextLine()){
    Document doc = Jsoup.parse(sc.nextLine());
    System.out.println(doc.text()+" "+doc.select("span.read-barfull").size());
}

输出:

Salary and Benefits 5
Job Security 5
Career Growth 4
Work Environment 5
CEO Rating 5