在 java 8 中过滤不重复的单词而不改变大小写

Question

我想使用 java 8.

通过不区分大小写的过滤按唯一元素过滤列表

例如： 1）输入：Goodbye bye Bye world world WorlD

输出：再见再见世界

2) 输入：Sam went to to to his business

输出：山姆去他的公司

我尝试了以下代码。我已将 distinct() 用于唯一元素和 map(x->x.toLowerCase()) 以便 distinct() 将通过降低大小写来过滤唯一元素。

    System.out.println("Enter the no of lines u will input:: ");
    Scanner sc = new Scanner(System.in);
    Integer noOfLines = sc.nextInt();
    sc.nextLine();
    List<String> listForInput;
    List<List<String>> allInputs = new ArrayList<>();
    for(int i =0; i<noOfLines; i++)
    {
        String receivedLine = sc.nextLine();

        String[] splittedInput = receivedLine.split(" ");

        List<String> list =  Stream.of(splittedInput)
                .map(x->x.toLowerCase())
                .distinct()
                .collect(Collectors.toList());

        list.forEach(x-> System.out.print(x+" "));

但在输出中我得到的所有元素都是小写的。我可以使用 java 8 或 m 我在这里做错了什么更好吗？

Answer 1

您正在使用 .map(x->x.toLowerCase()).

将全部转换为小写

您可以使用 TreeSet 来保持唯一性，并使用 removeIf 从列表中删除

List<String> list = new ArrayList<>(Arrays.asList(splittedInput));
TreeSet<String> unique = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
list.removeIf(e -> !unique.add(e)); // Check if already have then remove

Answer 2

您可以使用 Java-8

尝试以下解决方案

System.out.println("Enter the no of lines u will input:: ");
        Scanner sc = new Scanner(System.in);
        Integer noOfLines = sc.nextInt();
        sc.nextLine();
        List<List<String>> allInputs = new ArrayList<>();
        for (int i = 0; i < noOfLines; i++) {
            String receivedLine = sc.nextLine();

            List<String> list = Stream.of(Pattern.compile("\s").splitAsStream(receivedLine)
                    .collect(Collectors.collectingAndThen(
                            Collectors.toMap(String::toLowerCase, Function.identity(), (l, r) -> l, LinkedHashMap::new),
                            m -> String.join(" ", m.values())))
                    .split(" ")).collect(Collectors.toList());

            list.forEach(x -> System.out.print(x + " "));

        }

Answer 3

您可以像下面那样使用 LinkedHashSet，

for(int i =0; i<noOfLines; i++)
        {
            String receivedLine = sc.nextLine();

            String[] splittedInput = receivedLine.toLowerCase().split(" ");

            Set<String> list =  new LinkedHashSet<>(Arrays.asList(splittedInput));

            list.forEach(x-> System.out.print(x+" "));
         }

Answer 4

这是另一种方法。虽然用例略有不同，但建议的解决方案与 this answer by Stuart Marks.

相同

本质上，您想应用一个有状态过滤器：您想根据之前已经看到的元素丢弃某些元素。这或多或少是 distinct() 所做的，但是，distinct() 仅限于 equals 方法。以下方法提供了一个 Predicate 来维护状态：

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> seen.add(keyExtractor.apply(t));
}

然后可以使用以下方法实现所需的目标：

Arrays.stream(receivedLine.split(" "))
    .filter(distinctByKey(String::toLowerCase))
    .collect(Collectors.toList());

在 java 8 中过滤不重复的单词而不改变大小写

Filtering unique words without changing case in java 8

java

distinct

case-insensitive

java-stream