比较两个字符串值,其中一个是 tesseract 输出,另一个是 .txt 文件

Compare two string values, one of them being a tesseract output, the other a .txt file

我有一个程序使用 tesseract 来分析从计算机截取的图像。我还有一个包含“F1 car Bahrain”的文本文件。

try
{
    var path = @"C:\source\repos\TEst1\packages\Tesseract.4.1.1";
    string LOG_PATH = "C:\Desktop\start.txt";

    var sourceFilePath = @"C:\source\repos\Scrren\Scrren\bin\Debug\TestImage.png";
    using (var engine = new TesseractEngine(path, "eng"))
    {
        using (var img = Pix.LoadFromFile(sourceFilePath))
        {
            using (var page = engine.Process(img))
            {
                var results = page.GetText();

                string WordsFrom = File.ReadAllText(LOG_PATH);
                string WordsFromList = WordsFrom.ToLower();

                string ScreenResult = results.ToLower().ToString();
                string Match = ScreenResult;    

                bool C = Match.Contains(WordsFromList);
                if (C)
                {
                    Console.WriteLine("Match");
                }
                else
                {
                    Console.WriteLine("No Match");
                }    
            }
        }
    }
}
catch (Exception e)
{
    Thread.Sleep(1500);
}

这段代码会给我一个输出

"1 天前 cce sc ume f1 巴林大奖赛 ~ 开始时间,视频 nea cea a] 2021 年将在 f1 中失败的 8 个原因

显然 tesseract 并不完美,所以其中有些是胡言乱语,但是 f1 和 bahrain 两个字在那里,所以我不明白为什么 bool C 没有变为真。我完全被难住了,非常感谢您的帮助。

将字符串“WordsFromList”打印到控制台将显示它也在 f1 和巴林中正确添加。

查看下面代码中的注释:

using System.Text;

string searchFor = "F1 car Bahrain";
string searchIn = "1 day ago cce sc ume f1 bahrain grand prix ~ start time, how ake nos video nea cea a] 8 reasons 2021 will go down in f1";

// Returns false because there is no exact match for string "F1 car Bahrain" in the searchIn string.
Console.WriteLine($"Does {searchIn} contain {searchFor} => {searchIn.Contains(searchFor)}");

var words = searchFor.Split(' '); // Result is a string[] with 3 words ("F1", "car", "Bahrain").

// Returns false because 'car' is not in the input string. The 'All()' extension method only returns true if all words are matched.
Console.WriteLine($"Does {searchIn} contain {searchFor} => {words.All(word => searchIn.Contains(word, StringComparison.InvariantCultureIgnoreCase))}");
// Returns true because 'F1' or 'bahrain' are found in the input string. The 'Any()' extension method retuns true if any word matches.
Console.WriteLine($"Does {searchIn} contain {searchFor} => {words.Any(word => searchIn.Contains(word, StringComparison.InvariantCultureIgnoreCase))}");

Console.ReadKey();