单词计数器返回不正确的单词数

Question

我一直在尝试创建一个从文件中读取文本并将其存储在字符串中的程序。我将字符串提供给一个计算字符串中每个单词的函数。

然而，它唯一准确的是假设用户在一行的末尾留下了一些空格并且没有创建空行....不是一个很好的单词计数器。

创建空白行会导致字数错误增加。

我不确定我的主要问题是使用布尔值来执行此操作还是检查空格和“\n”字符。

bool countingLetters = false;
int wordCount = 0;
for (int i = 0; i < text.length(); i++)
{
    if (text[i] == ' ' && countingLetters == true)
    {
        countingLetters = false;
        wordCount++;
    }
    if (text[i] != ' ' && countingLetters == false)
    {
        countingLetters = true;
    }
    if (text[i] == '\n' && countingLetters == true)
    {
        countingLetters = false;
        wordCount++;
    }
}

Answer 1

另一种方法是计算“单词”的开头。

假设一个单词的开头是一个非字母之后的字母。如果需要，我们可以进行调整。

int wordCount = 0;
int prior = '\n';  // some non-letter
for (int i = 0; i < text.length(); i++) {
  if (isalpha(text[i]) && !isalpha(prior)) {
    wordCount++;
  }
  prior = text[i];
}

Answer 2

您的代码基本上是一个状态机。要完成您的解决方案，只需计算字符串结尾。

将此添加到您的代码末尾：

if(countingLetters) { // word at the end of string, without any space charactor
   wordCount++;
}

或者如果你能确定它是 C 风格的字符串，比如 std::string，你可以只索引 1 传递最后一个字符，并以与 space 相同的方式处理 '[=12=]'和 '\n' .

要改进您的代码，请使用 isspace（这涵盖了更多 space 字符，包括 '\t' 等）。最好使用 else if 模式。另外，==true 也不是什么好习惯。只需使用布尔值作为条件。

或者，isalpha(c) 可能更符合您的需要。

bool countingLetters = false;
int wordCount = 0;
for (char c:text) {
    if (!isalpha(c) && countingLetters) { // this also works for newline
        countingLetters = false;
        ++wordCount;
    } else if (isalpha(c) && !countingLetters) {
        countingLetters = true;
    } // otherwise just skip
}
if(countingLetters) { // word at the end of string, without any space charactor
   ++wordCount;
}

并且为了这样一个简单的任务而插入额外的字符是不可接受的。例如，text 可能是 const.

Answer 3

C++ 还提供了一些非常高级的方法来执行此操作。

一种是在字符串流上使用循环，在空白处拆分文本：

#include <sstream>
#include <string>

std::size_t count_words( const std::string& s )
{
  std::size_t count = 0;
  std::istringstream ss( s );
  std::string t;
  while (ss >> t) count += 1;
  return count;
}

另一个正在使用流迭代器算法：

#include <iterator>
#include <sstream>
#include <string>

std::size_t count_words( const std::string& s )
{
  std::istringstream ss( s );
  return std::distance( 
    std::istream_iterator <std::string> ( ss ), 
    std::istream_iterator <std::string> ()
  );
}

还有一个正在使用正则表达式：

#include <iterator>
#include <regex>
#include <string>

std::size_t count_words( const std::string& s )
{
  std::regex re( "\w+" );
  return std::distance(
    std::sregex_iterator( s.begin(), s.end(), re ),
    std::sregex_iterator()
  );
}

我敢肯定还有更多，但这三个是我脑海中浮现的。

单词计数器返回不正确的单词数

Word counter returning incorrect number of words

c++

arrays

string

char