重新实现算法以创建优化列表

Reimplement an algorithm to create a refine list

我正在尝试重新实现一种算法来创建优化关键字列表。我没有原始源代码,只有工具.exe文件,所以我只有输入和预期输出。

这里的问题是我函数的输出与原始函数的输出不匹配。这是我正在使用的代码:

string[] inputLines = File.ReadAllLines("Input.txt");
Dictionary<string, int> keywordsCount = new Dictionary<string, int>();
List<string> refineList = new List<string>();

//Get Keywords Count
foreach (string fileName in inputLines)
{
    string[] fileNameSplitted = fileName.Split('_');
    for (int i = 0; i < fileNameSplitted.Length; i++)
    {
        string currentKeyWord = fileNameSplitted[i];
        if (!string.Equals(currentKeyWord, "SFX", StringComparison.OrdinalIgnoreCase))
        {
            if (keywordsCount.ContainsKey(fileNameSplitted[i]))
            {
                keywordsCount[fileNameSplitted[i]] += 1;
            }
            else
            {
                keywordsCount.Add(fileNameSplitted[i], 1);
            }
        }
    }
}

//Get final keywords
foreach (KeyValuePair<string, int> keyword in keywordsCount)
{
    if (keyword.Value > 2 && keyword.Key.Length > 2)
    {
        refineList.Add(keyword.Key);
    }
}

输入文件:

SFX_AMB_BIRDSONG
SFX_AMB_BIRDSONG_MISC
SFX_AMB_BIRDSONG_SEAGULL
SFX_AMB_BIRDSONG_SEAGULL_BUSY
SFX_AMB_BIRDSONG_VULTURE
SFX_AMB_CAVES_DRIP
SFX_AMB_CAVES_DRIP_AUTO
SFX_AMB_CAVES_LOOP
SFX_AMB_DESERT_CICADAS
SFX_AMB_EARTHQUAKE
SFX_AMB_EARTHQUAKE_SHORT
SFX_AMB_EARTHQUAKE_STREAMED
SFX_AMB_FIRE_BURNING
SFX_AMB_FIRE_CAMP_FIRE
SFX_AMB_FIRE_JET
SFX_AMB_FIRE_LAVA
SFX_AMB_FIRE_LAVA_DEEP
SFX_AMB_FIRE_LAVA_JET1
SFX_AMB_FIRE_LAVA_JET2
SFX_AMB_FIRE_LAVA_JET3
SFX_AMB_FIRE_LAVA_JET_STOP
SFX_AMB_UNDW_BUBBLE_RELEASE
SFX_AMB_UNDW_BUBBLE_RELEASE_AUTO
SFX_AMB_WATER_BEACH1
SFX_AMB_WATER_BEACH2
SFX_AMB_WATER_BEACH3
SFX_AMB_WATER_CANALS
SFX_AMB_WATER_FALL_HUGE
SFX_AMB_WATER_FALL_NORMAL
SFX_AMB_WATER_FALL_NORMAL2
SFX_AMB_WATER_FALL_NORMAL3
SFX_AMB_WATER_FOUNTAIN
SFX_CS_LUX_PORTAL_LIGHTNING
SFX_CS_LUX_PORTAL_LIGHTNING1
SFX_CS_LUX_PORTAL_LIGHTNING2
SFX_CS_LUX_PRIEST_COWER
SFX_CS_LUX_PRIEST_MEDAL
SFX_CS_LUX_PRIEST_MEDITATE
SFX_CS_LUX_PRIEST_SCREAM
SFX_CS_LUX_PRIEST_SNIFF1
SFX_CS_LUX_PRIEST_SNIFF2
SFX_CS_LUX_PRIEST_SPIRITS
SFX_CS_LUX_PRIEST_SPIRITS2
SFX_CS_LUX_PRIEST_SPIRITS3
SFX_CS_LUX_PRIEST_SURPRISE
SFX_MON_BM05_TOO_WALK1
SFX_MON_BM05_TOO_WALK2
SFX_MON_BM06_SQU_WALK1
SFX_MON_BM06_SQU_WALK2
SFX_MON_BR06_HAL_ATTACK1
SFX_MON_BR06_HAL_ATTACK2
SFX_MON_BR06_HAL_DIE
SFX_MON_BR06_HAL_HIT
SFX_MON_BR06_HAL_IDLE
SFX_MON_BR06_HAL_IDLE_EATING
SFX_MON_BR06_HAL_LAND1
SFX_MON_BR06_HAL_LAND2
SFX_MON_BR06_HAL_SCRAPE
SFX_MON_BR06_HAL_SLAM
SFX_MON_BR06_HAL_SURPRISE
SFX_MON_BR06_HAL_WALK1
SFX_MON_BR06_HAL_WALK2
SFX_MON_BU01_MUM_ATTACK1
SFX_MON_BU01_MUM_ATTACK2
SFX_MON_BU01_MUM_DIE
SFX_MON_BU01_MUM_HIT
SFX_MON_BU01_MUM_IDLE_RETRIEVE
SFX_MON_BU01_MUM_IDLE_RETRIEVE_GROW
SFX_MON_BU01_MUM_SURPRISE
SFX_MON_BU01_MUM_WALK1
SFX_MON_BU01_MUM_WALK2
SFX_WATER_SPLASH_BIG
SFX_WATER_SPLASH_BIG1
SFX_WATER_SPLASH_BIG2
SFX_WATER_SPLASH_BIG3
SFX_WATER_SPLASH_MED1
SFX_WATER_SPLASH_MED2
SFX_WATER_SPLASH_MED3
SFX_WATER_SPLASH_MEDIUM
SFX_WATER_SPLASH_OUT
SFX_WATER_SPLASH_OUT1
SFX_WATER_SPLASH_OUT2
SFX_WATER_SPLASH_SMALL

以及预期输出(来自原始工具):

AMB
MON
WATER
LUX
BR06
HAL
SPLASH
PRIEST
FIRE
BU01
MUM
LAVA
BIRDSONG
WALK1
WALK2
JET
IDLE
EARTHQUAKE
FALL
SURPRISE
BIG
CAVES

我应该修改什么以使我的方法与原始输出匹配?

提前致谢!

--------编辑 我有一些新发现:

->这是一个大约100-130行的方法。

->使用 Visual Basic 方法 InStr、Len、Right 和 Left

->丢弃单词“SFX”,以及所有长度小于 3 个字符的单词。

->它使用一个组合框作为一个临时列表,它把所有的单词都放在里面 出现不止一次,并从这里取出一些单词,这些单词显示在用户可见的组合框中。

->对于我发布的第一个测试用例,这是丢弃的单词列表:

UNDW
BM05
BM06
SEAGULL
DRIP
BUBBLE
PORTAL
TOO
SQU
OUT
AUTO
RELEASE
NORMAL
LIGHTNING
SPIRITS
ATTACK1
ATTACK2
DIE
HIT
RETRIEVE

如何将其作为一个文本块,在行尾或下划线处拆分并获得唯一的剩余部分:

File.ReadAllText(path)
  .Split(new[]{'\r','\n','_'},StringSplitOptions.RemoveEmptyEntries)
  .Distinct();

等一下.. 可能只有长度为 3 的单词出现三次或更多次:

File.ReadAllText(path)
  .Split(new[]{'\r','\n','_'},StringSplitOptions.RemoveEmptyEntries)
  .GroupBy(w => w)
  .Where(g => g.Key.Length > 2 && g.Count() > 2)
  .Select(g => g.Key)

如果您有一个固定的要排除的单词列表,您可以执行例如.Except(new[]{ "SFX", "..." }) 最后..

您可以使用普通 LINQ 来完成,使用 GroupBy 并将其转换为字典。在该词典上,您可以在其中添加其他条件,例如检查最少出现次数。您无需担心几个 if-else 条件并保持其可读性:

string[] inputLines = File.ReadAllLines("Input.txt");

var output = inputLines
    .SelectMany(s =>
        s.Split('_')
            .Where(w => w != "SFX")
        )
    .GroupBy(g => g)
    .ToDictionary(s => s.Key, s => s.Count())
    .Where(w => w.Key.Length > 2 && w.Value > 2);

我试了一下。无法弄清楚顺序,性能也不是一流的,但是您可以为给定的示例获得所需的输出选择。

“SFX”可能 被排除在外,因为 (a) 包含在 all 输入项中,或 (b) 第一个每个输入项的一部分,但除了“PORTAL”之外,我将其保留为要排除的 hard-coded 字符串。我真的不知道为什么输出中排除了“PORTAL”。

此处,Input 是一个 string[],其中包含问题 post 中提供的示例输入。

var excludedWords = new[] { "SFX", "PORTAL" };

var feasibleWords = Input
    .SelectMany(str => str.Split('_'))
    .Where(word =>
        word.Length > 2 &&
        !excludedWords.Contains(word));

var repeatedWords = feasibleWords
    .GroupBy(word => word)
    .Where(gr => gr.Count() > 2)
    .ToDictionary(
        gr => gr.Key,
        gr => gr.Count());

var serialWords = feasibleWords
    .Except(repeatedWords.Keys)
    .GroupBy(word => Regex.Replace(word, @"[\d]", string.Empty))
    .Where(gr => 
        gr.Contains(gr.Key) && 
        gr.Count() > 3)
    .ToDictionary(
        gr => gr.Key,
        gr => gr.Count());

var output = repeatedWords.Concat(serialWords)
    .OrderByDescending(kvp => kvp.Value) // Doesn't add much value, but oh well
    .Select(kvp => kvp.Key);

Console.Write(string.Join(Environment.NewLine, output));

打印:

AMB
MON
WATER
LUX
BR06
HAL
SPLASH
FIRE
PRIEST
BU01
MUM
LAVA
BIRDSONG
FALL
WALK1
WALK2
IDLE
JET
BIG
CAVES
EARTHQUAKE
SURPRISE

终于搞定了!!

我终于弄明白了,我不得不使用 OllyDbg、Numega SmartCheck 和 VB 反编译工具,耐心等待。

这是代码,由于与 VB6:

相似,我在 VB.Net 中完成了它
'Clear comboboxes
Combo2.Items.Clear()
Combo3.Items.Clear()

'Start refining
Dim listboxItemsCount As Integer = listbox_SfxItems.Items.Count - 1
'Split only six words
For numberOfIterations As Integer = 0 To 5
    'Iterate listbox items
    For sfxItemIndex As Integer = 0 To listboxItemsCount
        'Iterate listbox items to find matches
        For sfxItemIndexSub As Integer = 0 To listboxItemsCount
            'Skip the line that we are checking in the previus loop
            If sfxItemIndex = sfxItemIndexSub Then
                Continue For
            End If
            'Get item from listbox
            Dim currentSfx As String = listbox_SfxItems.Items(sfxItemIndex)
            Dim wordToCheck As String = currentSfx
            'Split words
            If numberOfIterations > 0 Then
                For wordIndex = 1 To numberOfIterations
                    If InStr(1, wordToCheck, "_", CompareMethod.Binary) Then
                        Dim wordLength As Integer = Len(wordToCheck) - InStr(1, wordToCheck, "_", CompareMethod.Binary)
                        wordToCheck = Microsoft.VisualBasic.Right(wordToCheck, wordLength)
                    End If
                Next
            End If
            If InStr(1, wordToCheck, "_", CompareMethod.Binary) Then
                Dim wordLength As Integer = InStr(1, wordToCheck, "_", CompareMethod.Binary) - 1
                wordToCheck = Microsoft.VisualBasic.Left(wordToCheck, wordLength)
            End If
            'Find matches
            If StrComp("SFX", wordToCheck) <> 0 Then
                If Len(wordToCheck) > 2 Then
                    currentSfx = listbox_SfxItems.Items(sfxItemIndexSub)
                    If InStr(1, currentSfx, wordToCheck, CompareMethod.Binary) Then
                        'Get combo items count
                        Dim addNewItem As Boolean = True
                        For comboboxIndex As Integer = 0 To Combo2.Items.Count - 1
                            Dim comboWordItem As String = CType(Combo2.Items(comboboxIndex), ComboItemData).Name
                            'Check for duplicated
                            If InStr(1, comboWordItem, wordToCheck, CompareMethod.Binary) = 0 Then
                                Continue For
                            End If
                            'Update combo item with the word appearances count
                            currentSfx = CType(Combo2.Items(comboboxIndex), ComboItemData).Name
                            If StrComp(currentSfx, wordToCheck) = 0 Then
                                'Get current item data
                                Dim currentItemData As Integer = CType(Combo2.Items(comboboxIndex), ComboItemData).ItemData
                                'Update value
                                currentItemData += 1
                                CType(Combo2.Items(comboboxIndex), ComboItemData).ItemData = currentItemData
                            End If
                            'Don't add items in the combobox and quit loop
                            addNewItem = False
                            Exit For
                        Next
                        'Check if we have to add the new item
                        If addNewItem Then
                            Combo2.Items.Add(New ComboItemData(wordToCheck, 0))
                        End If
                    End If
                End If
            End If
        Next
    Next
Next

'Check final words
Combo3.Items.Add("All")
Combo3.Items.Add("HighLighted")

Dim quitLoop As Boolean = False
Do
    If Combo2.Items.Count > 0 Then
        Dim itemToRemove As Integer = -1
        'Get max value from the remaining words
        Dim maxWordAppearances As Integer = 0
        For itemIndex As Integer = 0 To Combo2.Items.Count - 1
            Dim itemData As Integer = CType(Combo2.Items(itemIndex), ComboItemData).ItemData
            maxWordAppearances = Math.Max(maxWordAppearances, itemData)
        Next
        'Get the item with the max value
        For index As Integer = 0 To Combo2.Items.Count - 1
            Dim itemData As Integer = CType(Combo2.Items(index), ComboItemData).ItemData
            If itemData = maxWordAppearances And itemToRemove = -1 Then
                itemToRemove = index
            End If
        Next
        'Remove and add items
        Dim itemStringName As String = CType(Combo2.Items(itemToRemove), ComboItemData).Name
        Combo3.Items.Add(itemStringName)
        Combo2.Items.RemoveAt(itemToRemove)
        'Check if we have to skip this loop
        If maxWordAppearances <= 5 Then
            quitLoop = True
        End If
    End If
Loop While quitLoop <> True
'Select the first item
Combo3.SelectedIndex = 0

不确定是否可以优化,但与原始版本一样,并以相同的顺序输出相同的单词。

如果要测试它,需要以下控件: 两个组合框,Combo2 是临时组合框,Combo3 是用户查看的组合框。它还需要一个包含要检查的项目的列表框。

comboItemData class 已从该站点提取:https://www.elguille.info/colabora/puntonet/alvaritus_itemdataennet.htm

我已将 Cls_lista 重命名为 ComboItemData