计算c中输入字符串中单词的出现次数

Question

我目前正在努力计算输入字符串中单词的出现次数。我相信这只是我的逻辑不对，但我已经摸索了一段时间，但我碰壁了。

我目前尚未解决的问题是：

输入较长时，字符串的末端有时会被截断。
重复每个单词时增加计数器

我知道代码中有些东西可能不是最理想的工作方式，但我对 C 还很陌生，所以任何指针都非常有用。

总而言之，我正在寻找帮助解决上述问题的建议

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>

#define MAX_WORDS 1000

int main(void) {
    int i,j,isUnique,uniqueLen;
    char word[MAX_WORDS];
    char words[200][30];
    char uniqueWords[200][30];
    int count[200];
    char *p = strtok(word, " ");
    int index=0;

    //read input until EOF is reached
    scanf("%[^EOF]", word);

    //initialize count array
    for (i = 0; i < 200; i++) {
        count[i] = 0;
    }
    //convert lower case letters to upper
    for (i = 0; word[i] != '[=10=]'; i++) {
        if (word[i] >= 'a' && word[i] <= 'z') {
            word[i] = word[i] - 32;
        }
    }

    //Split work string into an array and save each token into the array words
    p = strtok(word, " ,.;!\n");
    while (p != NULL)
    {
        strcpy(words[index], p);
        p = strtok(NULL, " ,.;!\n");
        index++;
    }

    /* 
    Check each string in the array word for occurances within the uniqueWords array. If it is unique then 
    copy the string from word into the unique word array. Otherwise the counter for the repeated word is incremented.
    */ 
    uniqueLen = 0;
    for (i = 0; i < index; i++) {
        isUnique = 1;
        for (j = 0; j < index; j++) {
            if (strcmp(uniqueWords[j],words[i])==0) {   
                isUnique = 0;
                break;
            }
            else {
            }
        }
        if (isUnique) {
            strcpy(uniqueWords[uniqueLen], words[i]);
            count[uniqueLen] += 1;
            uniqueLen++;
        }
        else {
        } 
    }

    for (i = 0; i < uniqueLen; i++) {
        printf("%s => %i\n", uniqueWords[i],count[i]);
    }
}

Answer 1

我不知道您是否面临一些要求，但尽管它在标准库函数方面存在所有限制，C 确实有一个可以使您的工作更轻松的函数，strstr，例如:

Live demo

#include <stdio.h>
#include <string.h>

int main() {

  const char str[] = "stringstringdstringdstringadasstringipoistring";
  const char* substr = "string";
  const char* orig = str;
  const char* temp = substr;


  int length = 0;
  while(*temp++){length++;} // length of substr

  int count = 0;
  char *ret = strstr(orig, substr);

  while (ret != NULL){
    count++; 
    //check next occurence  
    ret = strstr(ret + length, substr);   
  }
  printf("%d", count);
}

输出应该是6。

关于，scanf("%999[^\n]", word); 解析所有字符，直到找到 \n 或达到宽度限制，我同意 fgets ( word, sizeof word, stdin); 更好。

Answer 2

这是我最终使用的代码，事实证明这主要是使用 scanf 函数的问题。将它放在 while 循环中可以更轻松地编辑输入的单词。

谢谢大家的帮助:)

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>



int main(void) {

    // Create all variables 
    int i, len, isUnique, index;
    char word[200];
    char uniqueWords[200][30];
    int count[200];

    // Initialize the count array
    for (i = 0; i < 200; i++) {
        count[i] = 0;
    }

    // Set the value for index to 0
    index = 0;

    // Read all words inputted until the EOF marker is reached
    while (scanf("%s", word) != EOF) {

        /* 
        For each word being read if the characters within it are lowercase 
        then each are then incremented into being uppercase values.
        */
        for (i = 0; word[i] != '[=10=]'; i++) {
            if (word[i] >= 'a' && word[i] <= 'z') {
                word[i] = word[i] - 32;
            }
        }
        /* 
        We use len to find the length of the word being read. This is then used
        to access the final character of the word and remove it if it is not an
        alphabetic character.
        */
        len = strlen(word);
        if (ispunct(word[len - 1]))
            word[len - 1] = '[=10=]';

        /*
        The next part removes the non alphabetic characters from within the words.
        This happens by incrementing through each character of the word and by 
        using the isalpha and removing the characters if they are not alphabetic
        characters.
        */
        size_t pos = 0;
        for (char *p = word; *p; ++p)
            if (isalpha(*p))
                word[pos++] = *p;
        word[pos] = '[=10=]';

        /* 
        We set the isUnique value to 1 as upon comparing the arrays later we 
        change this value to 0 to show the word is not unique.
        */
        isUnique = 1;
        /* 
        For each word through the string we use a for loop when the counter i 
        is below the index and while the isUnique value is 1.
        */
        for (i = 0; i < index && isUnique; i++)
        {
            /* 
            Using the strcmp function we are able to check if the word in 
            question is in the uniqueWords array. If it is found we then 
            change the isUnique value to 0 to show that the value is not
            unique and prevent the loop happening again.
            */
            if (strcmp(uniqueWords[i], word) == 0)
                isUnique = 0;
        }

        /* If word is unique then add it to the uniqueWords list
        and increment index. Otherwise increment occurrence 
        count of current word.
        */
        if (isUnique)
        {   
            strcpy(uniqueWords[index], word);
            count[index]++;
            index++;
        }
        else
        {
            count[i - 1]++;
        }
    }
    /*
    For each item in the uniqueWords list we iterate through the words
    and print them out in the correct format with the word and the following count of them.
    */
    for (i = 0; i < index; i++)
    {
        printf("%s => %d\n", uniqueWords[i], count[i]);
    }
}

计算c中输入字符串中单词的出现次数

Counting occurrences of words within an inputted string in c

c

word-count