从文件中读取每个非英文字符
Reading each non-English character from a file
假设一个文件包含非英文文本。我们可以使用 FileIO.ReadLinesAsync 方法读取文件内容。现在每一行都包含一组字符。如何从此字符串中提取每个字母(非英文字母)?在这里,我用 C# 代码表示了我的问题。
List<string> finalAlphabets = new List<string>();
IList<string> alphabetLines = await FileIO.ReadLinesAsync(_languageFile,UnicodeEncoding.Utf8);
if (alphabetLines.Count != 0)
{
foreach (string alphabetLine in alphabetLines)
{
//lets say alphabetLine has "కాకికు", here i want to extract each letter from this and i want to add to finalAlphabets list
finalAlphabets.Add("కా"); // How to extract this letter from alphabetLine variable. If you look at the Length of alphabetLine , it shows 6, but actually in Telugu language it is 3 letter word.
}
}
有一组文本信息 类 - TextInfo
、StringInfo
,特别是您可能正在寻找 TextElementEnumerator
,它可以让您找到 "text element"边界。
来自 MSDN 文章的简化示例:
var myTEE = System.Globalization.StringInfo.GetTextElementEnumerator( "కాకికు");
while (myTEE.MoveNext()) {
Console.WriteLine( "[{0}]:\t{1}\t{2}",
myTEE.ElementIndex, myTEE.Current, myTEE.GetTextElement() );
}
产生以下输出:
[0]: కా కా
[2]: కి కి
[4]: కు కు
假设一个文件包含非英文文本。我们可以使用 FileIO.ReadLinesAsync 方法读取文件内容。现在每一行都包含一组字符。如何从此字符串中提取每个字母(非英文字母)?在这里,我用 C# 代码表示了我的问题。
List<string> finalAlphabets = new List<string>();
IList<string> alphabetLines = await FileIO.ReadLinesAsync(_languageFile,UnicodeEncoding.Utf8);
if (alphabetLines.Count != 0)
{
foreach (string alphabetLine in alphabetLines)
{
//lets say alphabetLine has "కాకికు", here i want to extract each letter from this and i want to add to finalAlphabets list
finalAlphabets.Add("కా"); // How to extract this letter from alphabetLine variable. If you look at the Length of alphabetLine , it shows 6, but actually in Telugu language it is 3 letter word.
}
}
有一组文本信息 类 - TextInfo
、StringInfo
,特别是您可能正在寻找 TextElementEnumerator
,它可以让您找到 "text element"边界。
来自 MSDN 文章的简化示例:
var myTEE = System.Globalization.StringInfo.GetTextElementEnumerator( "కాకికు");
while (myTEE.MoveNext()) {
Console.WriteLine( "[{0}]:\t{1}\t{2}",
myTEE.ElementIndex, myTEE.Current, myTEE.GetTextElement() );
}
产生以下输出:
[0]: కా కా
[2]: కి కి
[4]: కు కు