接受一个字符串并识别连续字母数的代码

Code that takes a string and recognizes the number of consecutive letters

为了完成我的工作,我需要一个代码,从用户那里获取一个单词,然后识别连续字母的数量,并以打印字母和重复次数的方式输出。

示例 1 输入:

hhhttrew

示例 1 输出:

h3t2rew

示例 2 输入:

uuuuuuhhhaaajqqq

示例 2 输出:

u6h3a3jq3
String text = sc.nextLine();
            int len = text.length();

            int repeat = 0;

            char[] chars = new char[len];

            // To convert string to char
            for (int h = 0; h < len; h++)
            {
                chars[h] = text.charAt(h);
            }

            String finaly = "";

            for (char ignored : chars)
            {
                for (int j = 0 ; j <len ; j++ )
                {
                    if (chars[j] == chars[j+1])
                    {
                        finaly = String.valueOf(chars[j]);

                        repeat++;

                        finaly = String.valueOf(repeat);
                    }
                    else
                    {
                        j++;
                    }
                }
            }
            System.out.println(finaly);

我会采取不同的方法。以下算法将 运行 in O(n) 因此 渐近 (!!!) 最优。弄清楚这一点,因为这似乎引起了一些混乱。有更小常数的更有效的解决方案。

基本上这个想法是遍历所有字符并将出现的次数存储在Map中。如果遇到一个尚未在 Map 中的值,我们将其添加到 Map 中,给它计数 1。如果我们在获取该值之前已经遇到该值并将其加 1。最后我们只需要按照插入的顺序迭代一次map,得到key和value

重要:使用LinkedHashMap对于此实现至关重要,因为必须保留插入顺序。 TreeMapHashMap 会给你一个字符串,它的值未定义顺序。有关详细信息,请参阅 this thread

此算法的 运行 时间为 O(n),因为 LinkedHashMap 上的所有操作都需要 O(1),其余时间我们只是循环遍历 n个字符。

import java.util.*;

public class Application {
    public static void main(String[] args) {
        var result1 = compress("hhhttrew");
        var result2 = compress("uuuuuuhhhaaajqqq");
        System.out.println(result1);
        System.out.println(result2);
    }

    public static String compress(String uncompressed) {
        var charArr = uncompressed.toCharArray();
        // LinkedHashMap is required here to preserve the order of insertion
        var occurrences = new LinkedHashMap<Character, Integer>();
        // iterate over all characters and count all occurrences 
        for (var character : charArr) {
            if (occurrences.containsKey(character)) occurrences.put(character, occurrences.get(character) + 1);
            else occurrences.put(character, 1);
        }
        // Create the target String by iterating over the map in order of insertion and concatenate key and value. Don't add the number of occurrence when the character occurs only once
        var sb = new StringBuilder();
        for (var entry : occurrences.entrySet()) {
            sb.append(entry.getKey());
            if(entry.getValue() != 1) sb.append(entry.getValue());
        }
        return sb.toString();
    }

}

预期输出:

h3t2rew
u6h3a3jq3

这是一种方法。你只需要一个循环。内部循环完成工作。外循环只提供测试用例。

  • 分配第一个字符
  • 并将该字符的计数设置为 1
  • 然后迭代直到相邻字符不同
  • 如果 > 1 则追加计数并追加不同的字符
  • 将下一个 运行 的计数设置为 0。
String[] data = { "uuuuuuhhhaaajqqq", 
        "hhhttrew","abbcccddddeeeeeffffffggggggg" };

for (String s : data) {
    String result = "" + s.charAt(0);
    int count = 1;
    for (int i = 1; i < s.length(); i++) {
        if (s.charAt(i - 1) != s.charAt(i)) {
            result += count <= 1 ? "" : count;
            result += s.charAt(i);
            count = 0;
        }
        count++;
        if (i == s.length() - 1) {
            result += count <= 1 ? "" : count;
        }
    }
    System.out.printf("%-15s <-- %s%n", result, s);
}

打印

u6h3a3jq3       <-- uuuuuuhhhaaajqqq
h3t2rew         <-- hhhttrew
ab2c3d4e5f6g7   <-- abbcccddddeeeeeffffffggggggg

在评论(现已删除)中,您询问了如何逆转该过程。这是一种方法。

  • 分配一个StringBuilder来保存结果。
  • 初始化countcurrentChar
  • 随着字符串的处理,
    • 保存一个字符到currentChar
    • 然后当下一个字符是数字时,构建计数
  • 如果计数仍然为 0,则下一个字符是一个数字,因此将计数加一并将 currentChar 复制到缓冲区
  • 否则,使用计算的长度。
String[] encoded =
        { "u6h3a3jq3", "h3t2rew", "ab2c3d4e5f6g7" };

for (String s : encoded) {
    
    StringBuilder sb = new StringBuilder();
    int count = 0;
    char currentChar = '[=12=]';
    for (int i = 0; i < s.length();) {
        if (Character.isLetter(s.charAt(i))) {
            currentChar = s.charAt(i++);
        }
        while (i < s.length()
                && Character.isDigit(s.charAt(i))) {
            count = count * 10 + s.charAt(i++) - '0';
        }
        count = count == 0 ? 1 : count;
        sb.append(Character.toString(currentChar)
                .repeat(count));
        count = 0;
    }
    System.out.println(s + " --> " + sb);
}

打印

u6h3a3jq3 --> uuuuuuhhhaaajqqq
h3t2rew --> hhhttrew
ab2c3d4e5f6g7 --> abbcccddddeeeeeffffffggggggg

只要有一个循环就可以大大改进您的解决方案。而且,这不使用任何复杂的数据结构,因此非常高效。

public static String consecutiveLetters(String text) {
        if (text == null || text.length() == 0) return "";

        // We start with he first letter
        char currentChar = text.charAt(0);
        int position = 1;
        int letterCount = 1;
        StringBuilder outputBuilder = new StringBuilder();

        while (position < text.length()) {
            // get the letter at the current position
            char charAtCurrentPos = text.charAt(position);

            // if it is different than what we had previously, we store the letter and the number of times it appeared. Reset the counter for the new letter
            if (charAtCurrentPos != currentChar) {
                outputBuilder.append(currentChar);
                currentChar = charAtCurrentPos;
                if (letterCount > 1) {
                    outputBuilder.append(letterCount);
                }
                letterCount = 1;
            } else {
                letterCount++;
            }
            position++;
        }

        // Add the last character as well
        outputBuilder.append(currentChar);
        if (letterCount > 1) {
            outputBuilder.append(letterCount);
        }

        return outputBuilder.toString();
    }

您可以获得matches of the regular expression

(.)*

这会生成一个由连续字符组成的字符串数组。例如,

"uuuuuuhhhaaajqqq" => ["uuuuuu", "hhh", "aaa", "j", "qqq"]

然后将每个元素转换为所需的形式是一件简单的事情:

["uuuuuu", "hhh", "aaa", "j", "qqq"] => ["u6", "h3", "a3", "j", "q3"]

然后join后一个数组的元素(连接字符串为空)形成所需的字符串:

"u6h3a3jq3"

正则表达式可以分解如下:

(      # begin capture group 1
  .    # match any character
)      # end capture group 1
*    # match the contents of capture group 1 zero one or more times

作为提取上述正则表达式匹配项的替代方法,可以 split the string on matches of the the regular expression

(?<=(.))(?!)

这会生成与之前相同的数组:

"uuuuuuhhhaaajqqq" => ["uuuuuu", "hhh", "aaa", "j", "qqq"]

这个正则表达式有以下元素。

(?<=   # begin positive lookbehind
  (    # begin capture group 1
    .  # match any character
  )    # end capture group 1
)      # end positive lookbehind
(?!    # begin a negative lookahead
     # match the contents of capture group 1
)      # end negative lookahead

这个表达式的匹配是zero-width,意思是它们匹配相邻字符之间的位置。例如,如果字符串是 "abbc",它将匹配 'a''b' 之间以及 'b''c' 之间的位置,将字符串分成三部分.