处理 1gb 数据文件以读取单词并计算最大长度单词?

handle 1gb data file to read words and calculate max length word?

我写了这段代码,但它在 1gb 大小的文件上会失败。

public class TestFiles {

    public static void main(String[] args) {
        int minLength = Integer.MAX_VALUE;
        int maxLength = Integer.MIN_VALUE;
        String minWord = "";
        String maxWord = "";
        List<String> words = new ArrayList<>();
        try {
            File myObj = new File("C:\Users\Downloads\java.txt");
            Scanner myReader = new Scanner(myObj);
            while (myReader.hasNextLine()) {
                String data = myReader.nextLine();
                String[] dataArray = data.split(" ");
                List<String> list = Arrays.asList(dataArray);
                for (String s : list) {
                    if (s.length() < minLength) {
                        minLength = s.length();
                        minWord = s;
                    } else if (s.length() > maxLength) {
                        maxLength = s.length();
                        maxWord = s;
                    }
                }
            }
            myReader.close();
        } catch (Exception e) {
            // TODO: handle exception
        }
        System.out.println("min length " + minLength + " - max lenth " + maxLength);
        System.out.println("min length word " + minWord + " - max lenth word " + maxLength);
    }
}

你能回答一下吗?我该如何解决这个问题?

int len = s.length();
if (len < minLength) {
    minLength = len;
    minWord = s;
} 
if (len > maxLength) {
    maxLength = len;
    maxWord = s;
}

如果大字符串位于第一行的第一个索引处,您的测试用例将失败。

顺便说一句,我认为你应该将大测试分解为小测试,尝试为单行查找小字符串和大字符串,然后是多行和文件中的数据

当 1gb 的单词被压缩成 1 行时,问题就变得很明显了!*

解决方案:不是“逐行”处理输入,而是“逐字处理”,足够高效! ;)

瞧:

public class TestFiles {

  public static void main(String[] args) {
    int minLength = Integer.MAX_VALUE;
    int maxLength = Integer.MIN_VALUE;
    String minWord = "";
    String maxWord = "";
    try {
        File myObj = new File("C:\Users\Downloads\java.txt");
        Scanner myReader = new Scanner(myObj);
        while (myReader.hasNext()) {
            String word = myReader.next();
            if (word.length() < minLength) {
              minLength = word.length();
              minWord = word;
            } 
            if (word.length() > maxLength) {
              maxLength = word.length();
              maxWord = word;
            }
          }
        }
        myReader.close();
    } catch (Exception e) {
        // TODO: handle exception
    }
    System.out.println("min length " + minLength + " - max lenth " + maxLength);
    System.out.println("min length word " + minWord + " - max lenth word " + maxLength);
  }
}

*当“很多”字在一行时,我们可能会在这里遇到问题:

  • myReader.hasNextLine(),
  • String data = myReader.nextLine()
  • String[] dataArray = data.split(" ");

@huy 的回答也正确:else“对大多数情况有效”,但“对极端情况不正确”。