C++ 从文件的一行中搜索某些单词,然后在这些单词之后插入一个单词
C++ searching a line from a file for certain words and then inserting a word after those words
我是 C++ 的新手,我已经苦苦思索了很长一段时间,试图弄清楚如何解决这个问题。基本上,我需要从一个文件中读取并找到一篇文章的所有实例("a"、"A"、"an"、"aN"、"An"、"AN","the","The","tHe","thE","THe","tHE","ThE","THE" ) 然后在该冠词后插入一个形容词。形容词的大小写必须以冠词前面的单词为准。例如,如果我找到 "a SHARK",我需要将其设为 "a HAPPY SHARK." 谁能告诉我最好的方法是什么?到目前为止,我已经放弃了很多想法,这就是我现在的想法,尽管我认为我不能这样做:
#include <iostream>
#include <string>
#include <cctype>
#include <fstream>
#include <sstream>
using namespace std;
void
usage(char *progname, string msg){
cerr << "Error: " << msg << endl;
cerr << "Usage is: " << progname << " [filename]" << endl;
cerr << " specifying filename reads from that file; no filename reads standard input" << endl;
}
int main(int argc, char *argv[])
{
string adj;
string file;
string line;
string articles[14] = {"a","A","an","aN","An","AN","the","The","tHe","thE","THe","tHE","ThE","THE"};
ifstream rfile;
cin >> adj;
cin >> file;
rfile.open(file.c_str());
if(rfile.fail()){
cerr << "Error while attempting to open the file." << endl;
return 0;
}
while(rfile.good()){
getline(rfile,line,'\n');
istringstream iss(line);
string word;
while(iss >> word){
for(int i = 0; i <= 14; i++){
if(word == articles[i]){
cout << word + " " << endl;
}else{
continue;
}
}
}
}
}
到目前为止,还不错,但如果您需要在一行的末尾处理一篇文章,那么逐行处理可能会遇到麻烦。
无论如何,暂时忽略这个问题,在匹配文章之后,首先您需要获取下一个需要大写的单词。然后您需要创建一个具有正确大写的形容词的新字符串版本:
string adj_buf; // big enough or dynamically allocate it based on adj
while(iss >> word){
for(int i = 0; i <= 14; i++){
if(word == articles[i]){
cout << word + " ";
iss >> word; // TODO: check return value and handle no more words on this line
adj_buf = adj;
for (j = 0; j < word.size() && j < adj.size(); ++j)
if (isupper(word[j]))
adj_buf[j] = toupper(adj[j]);
else
adj_buf[j] = tolower(adj[j]);
cout << adj_buf + " " + word;
break;
}
}
}
回到我们忽略的皱纹。您可能不希望逐行执行此操作,然后逐个标记执行此操作,因为处理这种特殊情况在您的控制中会很丑陋。相反,您可能希望在单个循环中逐个标记地执行此操作。
因此,您需要编写一个辅助函数或 class 来对文件进行操作并为您提供下一个标记。 (STL 中可能已经有这样的 class,我不确定。)无论如何,使用您的 I/O 它可能看起来像:
struct FileTokenizer
{
FileTokenizer(string fileName) : rfile(fileName) {}
bool getNextToken(string &token)
{
while (!(iss >> token))
{
string line;
if (!rfile.getline(rfile, line, '\n'))
return false;
iss.reset(line); // TODO: I don't know the actual call to reset it; look it up
}
return true;
}
private:
ifstream rfile;
istringstream iss;
};
然后您的主循环将如下所示:
FileTokenizer tokenizer(file);
while (tokenizer.getNextToken(word))
{
for(int i = 0; i <= 14; i++){
if(word == articles[i]){
cout << word + " ";
if (!tokenizer.getNextToken(word))
break;
adj_buf = adj;
for (j = 0; j < word.size() && j < adj.size(); ++j)
if (isupper(word[j]))
adj_buf[j] = toupper(adj[j]);
else
adj_buf[j] = tolower(adj[j]);
cout << adj_buf + " " + word;
break;
}
}
}
您可能还想输出其余的输入?
首先我建议你使用3个辅助函数来转换字符串大小写。如果您经常使用文本,这些将很有用。此处它们基于 <algorithm>
but many other aproaches are possible:
string strtoupper(const string& s) { // return the uppercase of the string
string str = s;
std::transform(str.begin(), str.end(), str.begin(), ::toupper);
return str;
}
string strtolower(const string& s) { // return the lowercase of the string
string str = s;
std::transform(str.begin(), str.end(), str.begin(), ::tolower);
return str;
}
string strcapitalize (const string& s) { // return the capitalisation (1 upper, rest lower) of the string
string str = s;
std::transform(str.begin(), str.end(), str.begin(), ::tolower);
if (str.size() > 0)
str[0] = toupper(str[0]);
return str;
}
然后是一个克隆单词大写的效用函数:它将形容词设置为小写或大写或将其大写(1 upper + rest lower)复制参考词的大小写。它足够强大,可以处理空词和非字母数字的词:
string clone_capitalisation(const string& a, const string& w) {
if (w.size() == 0 || !isalpha(w[0])) // empty or not a letter
return a; // => use adj as it is
else {
if (islower(w[0])) // lowercase
return strtolower(a);
else return w.size() == 1 || isupper(w[1]) ? strtoupper(a) : strcapitalize(a);
}
}
所有这些函数都不会改变原始字符串!
现在开始 main()
:我不喜欢手动输入所有可能的文章大小写组合,所以我只使用大写字母。
我也不喜欢为每个词依次浏览所有可能的文章。如果有更多的文章,它的性能就不会很好!所以我更喜欢使用 <set>
:
...
set<string> articles { "A", "AN", "THE" }; // shorter isn't it ?
...
while (getline(rfile, line)) {
istringstream iss(line);
string word;
while (iss >> word) { // loop
cout << word << " "; // output the word in any case
if (articles.find(strtoupper(word))!=articles.end()) { // article found ?
if (iss >> word) { // then read the next word
cout << clone_capitalisation(adj, word) << " " << word << " ";
}
else cout << word; // if case there is no next word on the line...
}
}
cout << endl;
}
我是 C++ 的新手,我已经苦苦思索了很长一段时间,试图弄清楚如何解决这个问题。基本上,我需要从一个文件中读取并找到一篇文章的所有实例("a"、"A"、"an"、"aN"、"An"、"AN","the","The","tHe","thE","THe","tHE","ThE","THE" ) 然后在该冠词后插入一个形容词。形容词的大小写必须以冠词前面的单词为准。例如,如果我找到 "a SHARK",我需要将其设为 "a HAPPY SHARK." 谁能告诉我最好的方法是什么?到目前为止,我已经放弃了很多想法,这就是我现在的想法,尽管我认为我不能这样做:
#include <iostream>
#include <string>
#include <cctype>
#include <fstream>
#include <sstream>
using namespace std;
void
usage(char *progname, string msg){
cerr << "Error: " << msg << endl;
cerr << "Usage is: " << progname << " [filename]" << endl;
cerr << " specifying filename reads from that file; no filename reads standard input" << endl;
}
int main(int argc, char *argv[])
{
string adj;
string file;
string line;
string articles[14] = {"a","A","an","aN","An","AN","the","The","tHe","thE","THe","tHE","ThE","THE"};
ifstream rfile;
cin >> adj;
cin >> file;
rfile.open(file.c_str());
if(rfile.fail()){
cerr << "Error while attempting to open the file." << endl;
return 0;
}
while(rfile.good()){
getline(rfile,line,'\n');
istringstream iss(line);
string word;
while(iss >> word){
for(int i = 0; i <= 14; i++){
if(word == articles[i]){
cout << word + " " << endl;
}else{
continue;
}
}
}
}
}
到目前为止,还不错,但如果您需要在一行的末尾处理一篇文章,那么逐行处理可能会遇到麻烦。
无论如何,暂时忽略这个问题,在匹配文章之后,首先您需要获取下一个需要大写的单词。然后您需要创建一个具有正确大写的形容词的新字符串版本:
string adj_buf; // big enough or dynamically allocate it based on adj
while(iss >> word){
for(int i = 0; i <= 14; i++){
if(word == articles[i]){
cout << word + " ";
iss >> word; // TODO: check return value and handle no more words on this line
adj_buf = adj;
for (j = 0; j < word.size() && j < adj.size(); ++j)
if (isupper(word[j]))
adj_buf[j] = toupper(adj[j]);
else
adj_buf[j] = tolower(adj[j]);
cout << adj_buf + " " + word;
break;
}
}
}
回到我们忽略的皱纹。您可能不希望逐行执行此操作,然后逐个标记执行此操作,因为处理这种特殊情况在您的控制中会很丑陋。相反,您可能希望在单个循环中逐个标记地执行此操作。
因此,您需要编写一个辅助函数或 class 来对文件进行操作并为您提供下一个标记。 (STL 中可能已经有这样的 class,我不确定。)无论如何,使用您的 I/O 它可能看起来像:
struct FileTokenizer
{
FileTokenizer(string fileName) : rfile(fileName) {}
bool getNextToken(string &token)
{
while (!(iss >> token))
{
string line;
if (!rfile.getline(rfile, line, '\n'))
return false;
iss.reset(line); // TODO: I don't know the actual call to reset it; look it up
}
return true;
}
private:
ifstream rfile;
istringstream iss;
};
然后您的主循环将如下所示:
FileTokenizer tokenizer(file);
while (tokenizer.getNextToken(word))
{
for(int i = 0; i <= 14; i++){
if(word == articles[i]){
cout << word + " ";
if (!tokenizer.getNextToken(word))
break;
adj_buf = adj;
for (j = 0; j < word.size() && j < adj.size(); ++j)
if (isupper(word[j]))
adj_buf[j] = toupper(adj[j]);
else
adj_buf[j] = tolower(adj[j]);
cout << adj_buf + " " + word;
break;
}
}
}
您可能还想输出其余的输入?
首先我建议你使用3个辅助函数来转换字符串大小写。如果您经常使用文本,这些将很有用。此处它们基于 <algorithm>
but many other aproaches are possible:
string strtoupper(const string& s) { // return the uppercase of the string
string str = s;
std::transform(str.begin(), str.end(), str.begin(), ::toupper);
return str;
}
string strtolower(const string& s) { // return the lowercase of the string
string str = s;
std::transform(str.begin(), str.end(), str.begin(), ::tolower);
return str;
}
string strcapitalize (const string& s) { // return the capitalisation (1 upper, rest lower) of the string
string str = s;
std::transform(str.begin(), str.end(), str.begin(), ::tolower);
if (str.size() > 0)
str[0] = toupper(str[0]);
return str;
}
然后是一个克隆单词大写的效用函数:它将形容词设置为小写或大写或将其大写(1 upper + rest lower)复制参考词的大小写。它足够强大,可以处理空词和非字母数字的词:
string clone_capitalisation(const string& a, const string& w) {
if (w.size() == 0 || !isalpha(w[0])) // empty or not a letter
return a; // => use adj as it is
else {
if (islower(w[0])) // lowercase
return strtolower(a);
else return w.size() == 1 || isupper(w[1]) ? strtoupper(a) : strcapitalize(a);
}
}
所有这些函数都不会改变原始字符串!
现在开始 main()
:我不喜欢手动输入所有可能的文章大小写组合,所以我只使用大写字母。
我也不喜欢为每个词依次浏览所有可能的文章。如果有更多的文章,它的性能就不会很好!所以我更喜欢使用 <set>
:
...
set<string> articles { "A", "AN", "THE" }; // shorter isn't it ?
...
while (getline(rfile, line)) {
istringstream iss(line);
string word;
while (iss >> word) { // loop
cout << word << " "; // output the word in any case
if (articles.find(strtoupper(word))!=articles.end()) { // article found ?
if (iss >> word) { // then read the next word
cout << clone_capitalisation(adj, word) << " " << word << " ";
}
else cout << word; // if case there is no next word on the line...
}
}
cout << endl;
}