为什么在将单词多重映射到文本文件 C++ 中的行时会得到额外的索引值?
Why am I getting extra index values when multi-mapping words to lines from text file C++?
我正在开发一个多图程序,它接收一个文本文件,删除标点符号,然后根据每个单词出现的行创建一个索引。代码编译并运行,但我得到了我不想要的输出。我很确定问题出在处理标点符号上。每次单词后面跟一个句点字符时,它都会将该单词计数两次,即使我排除了标点符号。然后它多次打印最后一个词,说它存在于文件中不存在的行上。如果能提供帮助,我们将不胜感激!
输入文件:
dogs run fast.
dogs bark loud.
cats sleep hard.
cats are not dogs.
Thank you.
#
C++代码:
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <map>
using namespace std;
int main(){
ifstream input;
input.open("NewFile.txt");
if ( !input )
{
cout << "Error opening file." << endl;
return 0;
}
multimap< string, int, less<string> > words;
int line; //int variable line
string word;//string variable word
// For each line of text, the length of input, increment line
for (line = 1; input; line++)
{
char buf[ 255 ];//create a character with space of 255
input.getline( buf, 128 );//buf is pointer to array of chars where
//extracted, 128 is maximum num of chars to write to s.
// Discard all punctuation characters, leaving only words
for ( char *p = buf;
*p != '[=11=]';
p++ )
{
if ( ispunct( *p ) )
*p = ' ';
}
//
istringstream i( buf );
while ( i )
{
i >> word;
if ( word != "" )
{
words.insert( pair<const string,int>( word, line ) );
}
}
}
input.close();
// Output results
multimap< string, int, less<string> >::iterator it1;
multimap< string, int, less<string> >::iterator it2;
for ( it1 = words.begin(); it1 != words.end(); )
{
it2 = words.upper_bound( (*it1).first );
cout << (*it1).first << " : ";
for ( ; it1 != it2; it1++ )
{
cout << (*it1).second << " ";
}
cout << endl;
}
return 0;
}
输出:
Thank : 5
are : 4
bark : 2
cats : 3 4
dogs : 1 2 4 4
fast : 1 1
hard : 3 3
loud : 2 2
not : 4
run : 1
sleep : 3
you : 5 5 6 7
期望的输出:
Thank : 5
are : 4
bark : 2
cats : 3 4
dogs : 1 2 4
fast : 1
hard : 3
loud : 2
not : 4
run : 1
sleep : 3
you : 5
在此先感谢您的帮助!
您没有删除标点符号,而是用空格替换。 istringstream
尝试解析这些空格,但如果失败。你应该检查是否解析一个单词成功或不这样做:
i >> word;
if (!i.fail()) {
words.insert(pair<const string, int>(word, line));
}
既然你使用的是C++,那么避免使用指针会更方便,而专注于使用std函数。我会像这样重写你的部分代码:
// For each line of text, the length of input, increment line
for (line = 1; !input.eof(); line++)
{
std::string buf;
std::getline(input, buf);
istringstream i( buf );
while ( i )
{
i >> word;
if (!i.fail()) {
std::string cleanWord;
std::remove_copy_if(word.begin(), word.end(),
std::back_inserter(cleanWord),
std::ptr_fun<int, int>(&std::ispunct)
);
if (!cleanWord.empty()) {
words.insert(pair<const string, int>(cleanWord, line));
}
}
}
}
input.close();
// Output results
multimap< string, int, less<string> >::iterator it1;
multimap< string, int, less<string> >::iterator it2;
我正在开发一个多图程序,它接收一个文本文件,删除标点符号,然后根据每个单词出现的行创建一个索引。代码编译并运行,但我得到了我不想要的输出。我很确定问题出在处理标点符号上。每次单词后面跟一个句点字符时,它都会将该单词计数两次,即使我排除了标点符号。然后它多次打印最后一个词,说它存在于文件中不存在的行上。如果能提供帮助,我们将不胜感激!
输入文件:
dogs run fast.
dogs bark loud.
cats sleep hard.
cats are not dogs.
Thank you.
#
C++代码:
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <map>
using namespace std;
int main(){
ifstream input;
input.open("NewFile.txt");
if ( !input )
{
cout << "Error opening file." << endl;
return 0;
}
multimap< string, int, less<string> > words;
int line; //int variable line
string word;//string variable word
// For each line of text, the length of input, increment line
for (line = 1; input; line++)
{
char buf[ 255 ];//create a character with space of 255
input.getline( buf, 128 );//buf is pointer to array of chars where
//extracted, 128 is maximum num of chars to write to s.
// Discard all punctuation characters, leaving only words
for ( char *p = buf;
*p != '[=11=]';
p++ )
{
if ( ispunct( *p ) )
*p = ' ';
}
//
istringstream i( buf );
while ( i )
{
i >> word;
if ( word != "" )
{
words.insert( pair<const string,int>( word, line ) );
}
}
}
input.close();
// Output results
multimap< string, int, less<string> >::iterator it1;
multimap< string, int, less<string> >::iterator it2;
for ( it1 = words.begin(); it1 != words.end(); )
{
it2 = words.upper_bound( (*it1).first );
cout << (*it1).first << " : ";
for ( ; it1 != it2; it1++ )
{
cout << (*it1).second << " ";
}
cout << endl;
}
return 0;
}
输出:
Thank : 5
are : 4
bark : 2
cats : 3 4
dogs : 1 2 4 4
fast : 1 1
hard : 3 3
loud : 2 2
not : 4
run : 1
sleep : 3
you : 5 5 6 7
期望的输出:
Thank : 5
are : 4
bark : 2
cats : 3 4
dogs : 1 2 4
fast : 1
hard : 3
loud : 2
not : 4
run : 1
sleep : 3
you : 5
在此先感谢您的帮助!
您没有删除标点符号,而是用空格替换。 istringstream
尝试解析这些空格,但如果失败。你应该检查是否解析一个单词成功或不这样做:
i >> word;
if (!i.fail()) {
words.insert(pair<const string, int>(word, line));
}
既然你使用的是C++,那么避免使用指针会更方便,而专注于使用std函数。我会像这样重写你的部分代码:
// For each line of text, the length of input, increment line
for (line = 1; !input.eof(); line++)
{
std::string buf;
std::getline(input, buf);
istringstream i( buf );
while ( i )
{
i >> word;
if (!i.fail()) {
std::string cleanWord;
std::remove_copy_if(word.begin(), word.end(),
std::back_inserter(cleanWord),
std::ptr_fun<int, int>(&std::ispunct)
);
if (!cleanWord.empty()) {
words.insert(pair<const string, int>(cleanWord, line));
}
}
}
}
input.close();
// Output results
multimap< string, int, less<string> >::iterator it1;
multimap< string, int, less<string> >::iterator it2;