如何从包含任何字符的文本文件中查找最长的单词?

How to find longest words from a text file with any characters?

我的任务是简单地从文本文档中检索最长的单词。 我如何调整它以适用于任何语言,例如俄语或阿拉伯语。 包含数字 0-9 的单词将被忽略,单词中的任何标点符号在存储前都会被删除

例如。 53-летний Ленин?

ex, العَامَّةÙzy عَلَى المَÙ

我的代码:

public Collection<String> getLongestWords() {

    String longestWord = "";
    String current;
    Scanner scan = new Scanner(new File("file.txt"));


    while (scan.hasNext()) {
        current = scan.next();
        if (current.length() > longestWord.length()) {
            longestWord = current;

        }
        return longestWord;

    }

}

注意:我之前从未实现过 unicode :/

我相信你可以全力以赴:(找到并 return 是文本文件中最长的单词)

import java.util.Scanner;
import java.io.File;
import java.io.FileNotFoundException;

public class hello {
     public static void main(String [ ] args) throws FileNotFoundException {
    new hello().getLongestWords();
 }

public String getLongestWords() throws FileNotFoundException {

    String longestWord = "";
    String current;
    Scanner scan = new Scanner(new File("file.txt"));


    while (scan.hasNext()) {
        current = scan.next();
        if (current.length() > longestWord.length()) {
            longestWord = current;
        }

    }
    System.out.println(longestWord);
            return longestWord;
        }

}

去除标点符号:

    longestWord.replaceAll("[^a-zA-Z ]", "").split("\s+");

在你之前 return !

如果你不想考虑带数字的单词:

if ((current.length() > longestWord.length()) && (!current.matches(".*\d.*"))) {

一切都在一起:

import java.util.Scanner;
import java.io.*;

public class hello {
     public static void main(String [ ] args) throws FileNotFoundException {
    new hello().getLongestWords();
 }

public String getLongestWords() throws FileNotFoundException {

    String longestWord = "";
    String current;
    Scanner scan = new Scanner(new File("file.txt"));


    while (scan.hasNext()) {
        current = scan.next();
        if ((current.length() > longestWord.length()) && (!current.matches(".*\d.*"))) {
            longestWord = current;
        }

    }
    System.out.println(longestWord);
    longestWord.replaceAll("[^a-zA-Z ]", "").split("\s+");
            return longestWord;
        }

}