一个字符一个字符地读取文件并在其中找到输入字符串

read a file char by char and find the input string in it

我有一个文件,需要逐个字符地读取并找到输入字符串。我需要 return 多少次 "input string" 出现在文件中,但我只需要逐字符读取文件。

我想出了下面的代码,但我在弄清楚如何通过逐字符读取来查找文件中的字符串时遇到问题。我首先迭代 for 循环,然后在内部有 while 循环,但是如果 char 不匹配,那么我需要从 for 循环重新开始,我不知道该怎么做?

  public static void main(String[] args) throws IOException {
    String input = "hello world"; // "hello";
    handleFile(new File("some_file"), input);
  }

  private static int handleFile(File file, String input) throws IOException {
    int count = 0;
    try (BufferedReader br =
        new BufferedReader(new InputStreamReader(new FileInputStream(file),
            Charset.forName("UTF-8")))) {
      char[] arr = input.toCharArray();
      int r;

      // confuse here what logic I should have here?
      for (char a : arr) {
        while ((r = br.read()) != -1) {
          char ch = (char) r;
          if (ch == a) {
            break;
          }
        }
      }
    }

    return count;
  }

因此,从概念上讲,您需要维护匹配字符数的偏移量,每次出现不匹配时,您将偏移量重置回 0。此偏移量用于确定输入中的给定字符是否与文件中的下一个字符匹配

一个简单的实现可能类似于...

String value = "Thistestistestatesttest";
String input = "test";

int offset = 0;
int matches = 0;
for (char next : value.toCharArray()) {
    if (next == input.charAt(offset)) {
        offset++;
        if (offset == input.length()) {
            matches++;
            offset = 0;
        }
    } else {
        offset = 0;
    }
}
System.out.println("Found " + matches);

请注意,我特意使用 String 作为来源,因此您可以对其进行测试并更好地理解它的工作原理,并根据逻辑实施您自己的解决方案。

现在,如果您花时间对问题进行案头检查,它可能类似于..

+======+========+==============+=======+x
| Next | offset | offset value | match |
+======+========+==============+=======+
| T    |      0 | t            | false |
+------+--------+--------------+-------+
| h    |      0 | t            | false |
+------+--------+--------------+-------+
| i    |      0 | t            | false |
+------+--------+--------------+-------+
| s    |      0 | t            | false |
+------+--------+--------------+-------+
| t    |      0 | t            | true  |
+------+--------+--------------+-------+
| e    |      1 | e            | true  |
+------+--------+--------------+-------+
| s    |      2 | s            | true  |
+------+--------+--------------+-------+
| t    |      3 | t            | true  |
+------+--------+--------------+-------+
| i    |      0 | t            | false |
+------+--------+--------------+-------+
| s    |      0 | t            | false |
+------+--------+--------------+-------+
| t    |      0 | t            | true  |
+------+--------+--------------+-------+
| e    |      1 | e            | true  |
+------+--------+--------------+-------+
| s    |      2 | s            | true  |
+------+--------+--------------+-------+
| t    |      3 | t            | true  |
+------+--------+--------------+-------+
| a    |      0 | t            | false |
+------+--------+--------------+-------+
| t    |      0 | t            | true  |
+------+--------+--------------+-------+
| e    |      1 | e            | true  |
+------+--------+--------------+-------+
| s    |      2 | s            | true  |
+------+--------+--------------+-------+
| t    |      3 | t            | true  |
+------+--------+--------------+-------+
| t    |      0 | t            | true  |
+------+--------+--------------+-------+
| e    |      1 | e            | true  |
+------+--------+--------------+-------+
| s    |      2 | s            | true  |
+------+--------+--------------+-------+
| t    |      3 | t            | true  |
+------+--------+--------------+-------+