读取文本文件并以内存有效的方式搜索字符串（并在找到时中止）

Question

我正在文本文件中搜索字符串（也包括 XML）。这是我首先想到的：

using (StreamReader sr = File.OpenText(fileName))
{
    string s = String.Empty;
    while ((s = sr.ReadLine()) != null)
    {
        if (s.Contains("mySpecialString"))
            return true;
    }
}

return false;

我想逐行阅读以尽量减少使用的 RAM 量。找到字符串后，它应该中止操作。我不按 XML 处理它的原因是因为它必须被解析，并且还会根据需要消耗更多内存。

另一个简单的实现是

bool found = File.ReadAllText(path).Contains("mySpecialString") ? true : false;

但这会将整个文件读入内存，这不是我想要的。另一方面，它可能会提高性能。

另一个就是这个

foreach (string line in File.ReadLines(path))
{
    if (line.Contains("mySpecialString"))
    {
        return true;
    }
}
return false;

但是其中哪一个（或您的另一个？）内存效率更高？

Answer 1

您可以使用带有 File.ReadLines 的查询，因此它只读取需要的行数以满足您的查询。 Any() 方法将在遇到包含您的字符串的行时停止。

return File.ReadLines(fileName).Any(line => line.Contains("mySpecialString"));

Answer 2

我认为您的两种解决方案都是相同的。阅读 MSDN：https://msdn.microsoft.com/en-us/library/dd383503%28v=vs.110%29.aspx

上面写着："The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned"

同一篇文章还建议将 ReadLines 与 LINQ to Objects 结合使用。

Answer 3

我也更喜欢接受的答案。也许我在这里对事情进行了微观优化，但您已经要求一种内存有效的方法。还要考虑到您正在搜索的文本也可能包含换行符，如 '\r'、'\n' 或 "\r\n"，并且大文件理论上可能包含一行，这会抵消 [= 的好处15=]。

所以你可以使用这个方法：

public static bool FileContainsString(string path, string str, bool caseSensitive = true)
{
     if(String.IsNullOrEmpty(str))
        return false;

    using (var stream = new StreamReader(path))
    while (!stream.EndOfStream)
    {
        bool stringFound = true;
        for (int i = 0; i < str.Length; i++)
        {
            char strChar = caseSensitive ? str[i] : Char.ToUpperInvariant(str[i]);
            char fileChar = caseSensitive ? (char)stream.Read() : Char.ToUpperInvariant((char)stream.Read());
            if (strChar != fileChar)
            {
                stringFound = false;
                break; // break for-loop, start again with first character at next position
            }
        }
        if (stringFound) 
            return true;
    }
    return false;
}

bool containsString = FileContainsString(path, "mySpecialString", false); // ignore case if desired

请注意，这可能是最有效的方法，并且隐藏在一个同样可读的方法中。但它有一个缺点，实现文化敏感比较是不可行的，因为它查看单个字符而不是子字符串。

因此，您必须记住一些边缘情况，您可以在这些情况下运行解决问题，例如 famous turkish i example or surrogate pairs。

读取文本文件并以内存有效的方式搜索字符串（并在找到时中止）

read a text file and search for string in memory efficient way (and abort when found)

c#

string

file

text

system.io.file