如何将文件中的整个文本连接到字符串中,避免字符串之间出现空行
How to contact whole text from file into the string avoiding empty lines beetwen strings
如何从联系的文档中获取整个文本到字符串中。我正在尝试按点拆分文本:string[] words = s.Split('.');
我想从文本文档中获取此文本。但是如果我的文本文档在字符串之间包含空行,例如:
pat said, “i’ll keep this ring.”
she displayed the silver and jade wedding ring which, in another time track,
she and joe had picked out; this
much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
结果如下所示:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track
3. she and joe had picked out this
4. much of the alternate world she had elected to retain
5. he wondered what if any legal basis she had kept in addition
6. none he hoped wisely however he said nothing
7. better not even to ask
但期望的正确输出应该是这样的:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track she and joe had picked out this much of the alternate world she had elected to retain
3. he wondered what if any legal basis she had kept in addition
4. none he hoped wisely however he said nothing
5. better not even to ask
因此,首先我需要处理文本文件内容以将整个文本作为单个字符串获取,如下所示:
pat said, “i’ll keep this ring.” she displayed the silver and jade wedding ring which, in another time track, she and joe had picked out; this much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
我不能以与列表内容相同的方式执行此操作,例如:string concat = String.Join(" ", text.ToArray());
、
我不确定如何将文本文档中的文本联系成字符串
您是否尝试过在使用句点拆分之前替换双 new-lines?
static string[] GetSentences(string filePath) {
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
var sentences = Regex.Split(lines, @"\.[\s]{1,}?");
return sentences;
}
我还没有测试过,但应该可以。
解释:
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
如果找不到文件则抛出异常。建议您用 try/catch.
包围方法调用
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
创建一个字符串,并忽略任何纯空格或空行。
var sentences = Regex.Split(lines, @".[\s]{1,}?");
创建一个字符串数组,其中字符串在每个句点和句点后面的空格处拆分。
例如:
字符串 "I came. I saw. I conquered" 将变为
- 我来了
- 我看到了
- 我征服了
更新:
方法如下one-liner,如果这是你的风格?
static string[] SplitSentences(string filePath) => File.Exists(filePath) ? Regex.Split(string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line))), @"") : null;
我想这就是你想要的:
var fileLocation = @"c:\myfile.txt";
var stringFromFile = File.ReadAllText(fileLocation);
//replace Environment.NewLine with any new line character your file uses
var withoutNewLines = stringFromFile.Replace(Environment.NewLine, "");
//modify to remove any unwanted character
var withoutUglyCharacters = Regex.Replace(withoutNewLines, "[“’”,;-]", "");
var withoutTwoSpaces = withoutUglyCharacters.Replace(" ", " ");
var result = withoutTwoSpaces.Split('.').Where(i => i != "").Select(i => i.TrimStart()).ToList();
因此,首先您从文件中读取所有文本,然后删除所有不需要的字符,然后按 .
和 return 非空项拆分
我建议您遍历所有字符并检查它们是否在 'a' >= char <= 'z'
范围内或是否在 char == ' '
范围内。如果符合条件,则将其添加到新创建的字符串中,否则检查它是否为 '.'
字符,如果是则结束你的行并添加另一个:
List<string> lines = new List<string>();
string line = string.Empty;
foreach(char c in str)
{
if((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20)
line += c;
else if(c == '.')
{
lines.Add(line.Trim());
line = string.Empty;
}
}
或者如果你喜欢 "one-liner"s :
IEnumerable<string> lines = new string(str.Select(c => (char)(((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20) ? c : c == '.' ? '\n' : '[=11=]')).ToArray()).Split('\n').Select(s => s.Trim());
我可能错了。我认为如果要拆分字符串,您可能不想更改它。例如,部分字符串中有 double/single quote(s) (")。可能不需要删除它们,这可能会带来一个问题,阅读包含 single/double 引号的文本文件(如您的示例数据文本所示),如下所示:
var stringFromFile = File.ReadAllText(fileLocation);
不会在文本框或控制台中正确显示这些字符,因为使用 ReadAllText
方法的默认编码是 UTF8。例如,single/double 引号将在表单的文本框中显示(替换字符)为菱形,并在显示到控制台时显示为问号 (?)。要保留 single/double 引号并让它们正确显示,您可以通过向 ReadAllText
方法添加参数来获取 OS 当前 ANSI 编码的编码,如下所示:
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
下面是使用简单的拆分方法在句点 (.) 上拆分字符串的代码。希望对您有所帮助。
private void button1_Click(object sender, EventArgs e) {
string fileLocation = @"C:\YourPath\YourFile.txt";
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
string bigString = stringFromFile.Replace(Environment.NewLine, "");
string[] result = bigString.Split('.');
int count = 1;
foreach (string s in result) {
if (s != "") {
textBox1.Text += count + ". " + s.Trim() + Environment.NewLine;
Console.WriteLine(count + ". " + s.Trim());
count++;
}
else {
// period at the end of the string
}
}
}
如何从联系的文档中获取整个文本到字符串中。我正在尝试按点拆分文本:string[] words = s.Split('.');
我想从文本文档中获取此文本。但是如果我的文本文档在字符串之间包含空行,例如:
pat said, “i’ll keep this ring.”
she displayed the silver and jade wedding ring which, in another time track,
she and joe had picked out; this
much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
结果如下所示:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track
3. she and joe had picked out this
4. much of the alternate world she had elected to retain
5. he wondered what if any legal basis she had kept in addition
6. none he hoped wisely however he said nothing
7. better not even to ask
但期望的正确输出应该是这样的:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track she and joe had picked out this much of the alternate world she had elected to retain
3. he wondered what if any legal basis she had kept in addition
4. none he hoped wisely however he said nothing
5. better not even to ask
因此,首先我需要处理文本文件内容以将整个文本作为单个字符串获取,如下所示:
pat said, “i’ll keep this ring.” she displayed the silver and jade wedding ring which, in another time track, she and joe had picked out; this much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
我不能以与列表内容相同的方式执行此操作,例如:string concat = String.Join(" ", text.ToArray());
、
我不确定如何将文本文档中的文本联系成字符串
您是否尝试过在使用句点拆分之前替换双 new-lines?
static string[] GetSentences(string filePath) {
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
var sentences = Regex.Split(lines, @"\.[\s]{1,}?");
return sentences;
}
我还没有测试过,但应该可以。
解释:
if (!File.Exists(filePath)) throw new FileNotFoundException($"Could not find file { filePath }!");
如果找不到文件则抛出异常。建议您用 try/catch.
包围方法调用var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
创建一个字符串,并忽略任何纯空格或空行。
var sentences = Regex.Split(lines, @".[\s]{1,}?");
创建一个字符串数组,其中字符串在每个句点和句点后面的空格处拆分。
例如:
字符串 "I came. I saw. I conquered" 将变为
- 我来了
- 我看到了
- 我征服了
更新:
方法如下one-liner,如果这是你的风格?
static string[] SplitSentences(string filePath) => File.Exists(filePath) ? Regex.Split(string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line))), @"") : null;
我想这就是你想要的:
var fileLocation = @"c:\myfile.txt";
var stringFromFile = File.ReadAllText(fileLocation);
//replace Environment.NewLine with any new line character your file uses
var withoutNewLines = stringFromFile.Replace(Environment.NewLine, "");
//modify to remove any unwanted character
var withoutUglyCharacters = Regex.Replace(withoutNewLines, "[“’”,;-]", "");
var withoutTwoSpaces = withoutUglyCharacters.Replace(" ", " ");
var result = withoutTwoSpaces.Split('.').Where(i => i != "").Select(i => i.TrimStart()).ToList();
因此,首先您从文件中读取所有文本,然后删除所有不需要的字符,然后按 .
和 return 非空项拆分
我建议您遍历所有字符并检查它们是否在 'a' >= char <= 'z'
范围内或是否在 char == ' '
范围内。如果符合条件,则将其添加到新创建的字符串中,否则检查它是否为 '.'
字符,如果是则结束你的行并添加另一个:
List<string> lines = new List<string>();
string line = string.Empty;
foreach(char c in str)
{
if((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20)
line += c;
else if(c == '.')
{
lines.Add(line.Trim());
line = string.Empty;
}
}
或者如果你喜欢 "one-liner"s :
IEnumerable<string> lines = new string(str.Select(c => (char)(((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20) ? c : c == '.' ? '\n' : '[=11=]')).ToArray()).Split('\n').Select(s => s.Trim());
我可能错了。我认为如果要拆分字符串,您可能不想更改它。例如,部分字符串中有 double/single quote(s) (")。可能不需要删除它们,这可能会带来一个问题,阅读包含 single/double 引号的文本文件(如您的示例数据文本所示),如下所示:
var stringFromFile = File.ReadAllText(fileLocation);
不会在文本框或控制台中正确显示这些字符,因为使用 ReadAllText
方法的默认编码是 UTF8。例如,single/double 引号将在表单的文本框中显示(替换字符)为菱形,并在显示到控制台时显示为问号 (?)。要保留 single/double 引号并让它们正确显示,您可以通过向 ReadAllText
方法添加参数来获取 OS 当前 ANSI 编码的编码,如下所示:
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
下面是使用简单的拆分方法在句点 (.) 上拆分字符串的代码。希望对您有所帮助。
private void button1_Click(object sender, EventArgs e) {
string fileLocation = @"C:\YourPath\YourFile.txt";
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
string bigString = stringFromFile.Replace(Environment.NewLine, "");
string[] result = bigString.Split('.');
int count = 1;
foreach (string s in result) {
if (s != "") {
textBox1.Text += count + ". " + s.Trim() + Environment.NewLine;
Console.WriteLine(count + ". " + s.Trim());
count++;
}
else {
// period at the end of the string
}
}
}