从 UWP 应用程序中的文本文件中读取 unicode 字符串
read unicode string from text file in UWP app
在 Windows 10 应用程序中,我尝试从 .txt 文件中读取字符串并将文本设置为 RichEditBox:
代码变体 1:
var read = await FileIO.ReadTextAsync(file, Windows.Storage.Streams.UnicodeEncoding.Utf8);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, read);
代码变体 2:
var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.ReadWrite);
ulong size = stream.Size;
using (var inputStream = stream.GetInputStreamAt(0))
{
using (var dataReader = new Windows.Storage.Streams.DataReader(inputStream))
{
dataReader.UnicodeEncoding = Windows.Storage.Streams.UnicodeEncoding.Utf8;
uint numBytesLoaded = await dataReader.LoadAsync((uint)size);
string text = dataReader.ReadString(numBytesLoaded);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.FormatRtf, text);
}
}
在某些文件上我有这个错误 - "No mapping for the Unicode character exists in the target multi-byte code page"
我找到了一个解决方案:
IBuffer buffer = await FileIO.ReadBufferAsync(file);
DataReader reader = DataReader.FromBuffer(buffer);
byte[] fileContent = new byte[reader.UnconsumedBufferLength];
reader.ReadBytes(fileContent);
string text = Encoding.UTF8.GetString(fileContent, 0, fileContent.Length);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, text);
但是使用此代码,文本看起来像菱形问号。
如何以正常编码读取和显示相同的文本文件?
这里的挑战是编码,它取决于您的应用程序需要多少精度。
如果你需要快速简单的东西,你可以调整这个 answer
public static Encoding GetEncoding(byte[4] bom)
{
// Analyze the BOM
if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return Encoding.UTF32;
return Encoding.ASCII;
}
async System.Threading.Tasks.Task MyMethod()
{
FileOpenPicker openPicker = new FileOpenPicker();
StorageFile file = await openPicker.PickSingleFileAsync();
IBuffer buffer = await FileIO.ReadBufferAsync(file);
DataReader reader = DataReader.FromBuffer(buffer);
byte[] fileContent = new byte[reader.UnconsumedBufferLength];
reader.ReadBytes(fileContent);
string text = GetEncoding(new byte[4] {fileContent[0], fileContent[1], fileContent[2], fileContent[3] }).GetString(fileContent);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, text);
//..
}
如果您需要更准确的东西,您应该考虑将移植到 UWP to .Net of Mozilla charset detector as already mentioned in this answer
请注意,上面的代码只是一个示例,它缺少实现 IDisposable 的类型的所有 using 语句,它也应该以更一致的方式编写
第h
-g
解决方案:
1) 我将 Mozilla 通用字符集检测器移植到 UWP(添加到 Nuget)
ICharsetDetector cdet = new CharsetDetector();
cdet.Feed(fileContent, 0, fileContent.Length);
cdet.DataEnd();
2) Nuget 库 Portable.Text.Encoding
if (cdet.Charset != null)
string text = Portable.Text.Encoding.GetEncoding(cdet.Charset).GetString(fileContent, 0, fileContent.Length);
就是这样。现在 unicode ecnodings(包括 cp1251、cp1252)- 效果很好))
StorageFile file = await StorageFile.GetFileFromApplicationUriAsync(new Uri("ms-appx:///Assets/FontFiles/" + fileName));
using (var inputStream = await file.OpenReadAsync())
using (var classicStream = inputStream.AsStreamForRead())
using (var streamReader = new StreamReader(classicStream))
{
while (streamReader.Peek() >= 0)
{
line = streamReader.ReadLine();
}
}
在 Windows 10 应用程序中,我尝试从 .txt 文件中读取字符串并将文本设置为 RichEditBox:
代码变体 1:
var read = await FileIO.ReadTextAsync(file, Windows.Storage.Streams.UnicodeEncoding.Utf8);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, read);
代码变体 2:
var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.ReadWrite);
ulong size = stream.Size;
using (var inputStream = stream.GetInputStreamAt(0))
{
using (var dataReader = new Windows.Storage.Streams.DataReader(inputStream))
{
dataReader.UnicodeEncoding = Windows.Storage.Streams.UnicodeEncoding.Utf8;
uint numBytesLoaded = await dataReader.LoadAsync((uint)size);
string text = dataReader.ReadString(numBytesLoaded);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.FormatRtf, text);
}
}
在某些文件上我有这个错误 - "No mapping for the Unicode character exists in the target multi-byte code page"
我找到了一个解决方案:
IBuffer buffer = await FileIO.ReadBufferAsync(file);
DataReader reader = DataReader.FromBuffer(buffer);
byte[] fileContent = new byte[reader.UnconsumedBufferLength];
reader.ReadBytes(fileContent);
string text = Encoding.UTF8.GetString(fileContent, 0, fileContent.Length);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, text);
但是使用此代码,文本看起来像菱形问号。
如何以正常编码读取和显示相同的文本文件?
这里的挑战是编码,它取决于您的应用程序需要多少精度。 如果你需要快速简单的东西,你可以调整这个 answer
public static Encoding GetEncoding(byte[4] bom)
{
// Analyze the BOM
if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return Encoding.UTF32;
return Encoding.ASCII;
}
async System.Threading.Tasks.Task MyMethod()
{
FileOpenPicker openPicker = new FileOpenPicker();
StorageFile file = await openPicker.PickSingleFileAsync();
IBuffer buffer = await FileIO.ReadBufferAsync(file);
DataReader reader = DataReader.FromBuffer(buffer);
byte[] fileContent = new byte[reader.UnconsumedBufferLength];
reader.ReadBytes(fileContent);
string text = GetEncoding(new byte[4] {fileContent[0], fileContent[1], fileContent[2], fileContent[3] }).GetString(fileContent);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, text);
//..
}
如果您需要更准确的东西,您应该考虑将移植到 UWP to .Net of Mozilla charset detector as already mentioned in this answer
请注意,上面的代码只是一个示例,它缺少实现 IDisposable 的类型的所有 using 语句,它也应该以更一致的方式编写
第h -g
解决方案:
1) 我将 Mozilla 通用字符集检测器移植到 UWP(添加到 Nuget)
ICharsetDetector cdet = new CharsetDetector();
cdet.Feed(fileContent, 0, fileContent.Length);
cdet.DataEnd();
2) Nuget 库 Portable.Text.Encoding
if (cdet.Charset != null)
string text = Portable.Text.Encoding.GetEncoding(cdet.Charset).GetString(fileContent, 0, fileContent.Length);
就是这样。现在 unicode ecnodings(包括 cp1251、cp1252)- 效果很好))
StorageFile file = await StorageFile.GetFileFromApplicationUriAsync(new Uri("ms-appx:///Assets/FontFiles/" + fileName));
using (var inputStream = await file.OpenReadAsync())
using (var classicStream = inputStream.AsStreamForRead())
using (var streamReader = new StreamReader(classicStream))
{
while (streamReader.Peek() >= 0)
{
line = streamReader.ReadLine();
}
}