过滤末尾带有数字的字符串（例如 foo12）的最有效方法是什么？

Question

这是一个自我思考的测验，与我面临的现实生活中的问题非常相似。

假设我有一个字符串列表（假设它被称为 stringlist），其中一些在末尾附有两位数字。例如，"foo"、"foo01"、"foo24".

我想将字母相同（但末尾的两位数不同）的那些分组。

因此，"foo"、"foo01" 和 "foo24" 将在组 "foo" 下。

但是，我不能只检查任何以 "foo" 开头的字符串，因为我们还可以有 "food"、"food08"、"food42".

没有重复项。

中间可以有数字。例如）"foo543food43" 在组 "foo543food"

下

或者甚至是最后的多个数字。例如）"foo1234" 在组 "foo12"

下

我能想到的最明显的解决方案是拥有一个数字列表。

numbers = ["0", "1", "2", ... "9"]

那我就

grouplist = [[]] //Of the form: [[group_name1, word_index1, word_index2, ...], [group_name2, ...]]
for(word_index=0; word_index < len(stringlist); word_index++) //loop through stringlist
    for(char_index=0; char_index < len(stringlist[word_index]); char_index++) //loop through the word
        if(char_index == len(stringlist[word_index])-1) //Reached the end
            for(number1 in numbers)
                if(char_index == number1) //Found a number at the end
                    for(number2 in numbers)
                        if(char_index-1 == number2) //Found another number one before the end
                            group_name = stringlist[word_index].substring(0,char_index-1)
                            for(group_element in grouplist)
                                if(group_element[0] == group_name) //Does that group name exist already? If so, add the index to the end. If not, add the group name and the index.
                                    group_element.append(word_index)
                                else
                                    group_element.append([stringlist[word_index].substring(0,char_index-1), word_index])
                     break //If you found the first number, stop looping through numbers
                            break //If you found the second number, stop looping through numbers

现在这看起来乱七八糟。你们能想到任何更清洁的方法吗？包括最终结果在内的任何数据结构都可以是您想要的。

Answer 1

我会创建一个映射，将组名映射到相应组的所有字符串的列表。

这是我在 java 中的做法：

public Map<String, List<String>> createGroupMap(Lust<String> listOfAllStrings){
  Map<String, List<String>> result= new Hashmap<>();
  for(String s: listOfAllStrings){
    addToMap(result, s)
  }
}

private addToMap(Map<String, List<String>> map, String s){
  String group=getGroupName(s);
  if(!map.containsKey(group))
    map.put(group,new ArrayList<String>();
  map.get(group).add(s);
}

private String getGroupName(String s){
  return s.replaceFirst("\d+$", "");
}

也许您可以通过避免使用 getGroupName(..) 中的 RegExp 来提高速度，但您需要对其进行概要分析以确保没有 RegExp 的实现会更快。

Answer 2

您可以像这样将字符串分成两部分。

pair<string, int> divide(string s) {
int r = 0;
if(isdigit(s.back())) {
    r = s.back() - '0';
    s.pop_back();
    if(isdigit(s.back())) {
        r += 10 * (s.back() - '0');
        s.pop_back();
    }
}
return {s, r}

}

过滤末尾带有数字的字符串（例如 foo12）的最有效方法是什么？

What's the most efficient way of filtering a string with numbers at the end (e.g. foo12)?

language-agnostic

algorithm

data-structures