如何在 Delphi 7 中获取 TTNTRichEdit unicode 内容?
How can I get TTNTRichEdit unicode content in Delphi 7?
如何 get/set Unicode (utf8/utf16) 格式的 TTNTRichEdit RTF 内容?
我使用 TStringStreams 的 TRichEdit.loadFromStream/saveToStream 方法来获取设置 RTF 内容。但它只对非标准 ASCII 字符使用依赖于语言环境的 ANSI 代码。 (4x:\`f5
)
但是如果用户将 him/her 项目带到另一台具有不同语言环境的计算机上,我就会遇到麻烦。国家字符将丢失。
EM_STREAMIN/EM_SREAMOUT 消息 SF_UNICODE 标志只能与 SF_TEXT 组合,而不是 SF_RTF.
你没问题。您正在使用 Unicode 兼容组件。您不会遭受数据丢失。来自 Wikipedia article on RTF:
A standard RTF file can consist of only 7-bit ASCII characters, but can encode characters beyond ASCII by escape sequences. The character escapes are of two types: code page escapes and, starting with RTF 1.5, Unicode escapes. In a code page escape, two hexadecimal digits following a backslash and typewriter apostrophe are used for denoting a character taken from a Windows code page. For example, if the code page is set to Windows-1256, the sequence \'c8
will encode the Arabic letter bāʼ (ب).
For a Unicode escape the control word \u is used, followed by a 16-bit signed decimal integer giving the Unicode UTF-16 code unit number. For the benefit of programs without Unicode support, this must be followed by the nearest representation of this character in the specified code page. For example, \u1576? would give the Arabic letter bāʼ ب, specifying that older programs which do not have Unicode support should render it as a question mark instead.
您正在观察代码页转义。不过没关系。这就是 \`f5
的意思。该字符位于文档的代码页中,因此可以使用代码页转义。如果包含文档代码页之外的字符,则控件将使用 Unicode 转义符。
使用 Borland C++ 6 解决了 (必要)。相同的代码模式适用于 Borland Delphi。
(注意:TTntRichEdit 仅在明确具有 BOM header“737”或 [0xEF、0xBB、0xBF] 时才将 UTF-8 文本加载为 UTF-8)
// This only works with BOM explicit files
// (it will fail on BOM-less UTF-8 files)
TTntRichEdit *myTntRichEdit = ...{some init code}...
myTntRichEdit->Lines->LoadFromFile(UTF8_filename);
所以这是我的工作生产代码:
(注:TRESource声明为TTntRichEdit *TRESource;)
void TFormMyExample::LoadJavascriptFromFile(AnsiString myFile) {
// This method will load a UTF-8 text file (with or without BOM)
// // // TRESource->Lines->LoadFromFile(myFile);
TMemoryStream *JSMemoryStream;
TMemoryStream *JSBOM_MemoryStream;
AnsiString BOM = "737"; // [0xEF, 0xBB, 0xBF]
try {
JSMemoryStream = new TMemoryStream();
JSMemoryStream->LoadFromFile(myFile);
// check for BOM
char BOMHeader[4];
JSMemoryStream->Seek(0, soFromBeginning);
JSMemoryStream->ReadBuffer(BOMHeader, 3);
JSMemoryStream->Seek(0, soFromBeginning); // reset
BOMHeader[3] = 0;
if (strcmp(BOM.c_str(), BOMHeader) == 0) {
// We have BOM header, so load it.
TRESource->Lines->LoadFromStream(JSMemoryStream);
} else {
// We need the BOM header, so add it.
try {
JSBOM_MemoryStream = new TMemoryStream;
JSBOM_MemoryStream->Write(BOM.c_str(), BOM.Length());
JSBOM_MemoryStream->Seek(0,soFromEnd);
JSBOM_MemoryStream->CopyFrom(JSMemoryStream, 0);
JSBOM_MemoryStream->Seek(0, soFromBeginning);
TRESource->Lines->LoadFromStream(JSBOM_MemoryStream);
}
__finally
{
delete JSBOM_MemoryStream;
}
}
}
__finally
{
delete JSMemoryStream;
}
}
我写处理后的文件时,就是这样写的。
(注:TREProcessed声明为TTntRichEdit *TREProcessed;又:AnsiString outputFileName;)
ofstream SaveFile(outputFileName.c_str());
TREProcessed->PlainText = true;
SaveFile << "737"; // Add UTF8 BOM [0xEF, 0xBB, 0xBF]
for (int i = 0, max = TREProcessed->Lines->Count; i < max; i++) {
SaveFile << UTF8Encode(TREProcessed->Lines->Strings[i]).c_str();
if (i < max - 1) {
SaveFile << UTF8Encode(_WS "\n").c_str();
}
}
SaveFile.close();
如何 get/set Unicode (utf8/utf16) 格式的 TTNTRichEdit RTF 内容?
我使用 TStringStreams 的 TRichEdit.loadFromStream/saveToStream 方法来获取设置 RTF 内容。但它只对非标准 ASCII 字符使用依赖于语言环境的 ANSI 代码。 (4x:\`f5
)
但是如果用户将 him/her 项目带到另一台具有不同语言环境的计算机上,我就会遇到麻烦。国家字符将丢失。
EM_STREAMIN/EM_SREAMOUT 消息 SF_UNICODE 标志只能与 SF_TEXT 组合,而不是 SF_RTF.
你没问题。您正在使用 Unicode 兼容组件。您不会遭受数据丢失。来自 Wikipedia article on RTF:
A standard RTF file can consist of only 7-bit ASCII characters, but can encode characters beyond ASCII by escape sequences. The character escapes are of two types: code page escapes and, starting with RTF 1.5, Unicode escapes. In a code page escape, two hexadecimal digits following a backslash and typewriter apostrophe are used for denoting a character taken from a Windows code page. For example, if the code page is set to Windows-1256, the sequence
\'c8
will encode the Arabic letter bāʼ (ب).For a Unicode escape the control word \u is used, followed by a 16-bit signed decimal integer giving the Unicode UTF-16 code unit number. For the benefit of programs without Unicode support, this must be followed by the nearest representation of this character in the specified code page. For example, \u1576? would give the Arabic letter bāʼ ب, specifying that older programs which do not have Unicode support should render it as a question mark instead.
您正在观察代码页转义。不过没关系。这就是 \`f5
的意思。该字符位于文档的代码页中,因此可以使用代码页转义。如果包含文档代码页之外的字符,则控件将使用 Unicode 转义符。
使用 Borland C++ 6 解决了 (必要)。相同的代码模式适用于 Borland Delphi。 (注意:TTntRichEdit 仅在明确具有 BOM header“737”或 [0xEF、0xBB、0xBF] 时才将 UTF-8 文本加载为 UTF-8)
// This only works with BOM explicit files
// (it will fail on BOM-less UTF-8 files)
TTntRichEdit *myTntRichEdit = ...{some init code}...
myTntRichEdit->Lines->LoadFromFile(UTF8_filename);
所以这是我的工作生产代码: (注:TRESource声明为TTntRichEdit *TRESource;)
void TFormMyExample::LoadJavascriptFromFile(AnsiString myFile) {
// This method will load a UTF-8 text file (with or without BOM)
// // // TRESource->Lines->LoadFromFile(myFile);
TMemoryStream *JSMemoryStream;
TMemoryStream *JSBOM_MemoryStream;
AnsiString BOM = "737"; // [0xEF, 0xBB, 0xBF]
try {
JSMemoryStream = new TMemoryStream();
JSMemoryStream->LoadFromFile(myFile);
// check for BOM
char BOMHeader[4];
JSMemoryStream->Seek(0, soFromBeginning);
JSMemoryStream->ReadBuffer(BOMHeader, 3);
JSMemoryStream->Seek(0, soFromBeginning); // reset
BOMHeader[3] = 0;
if (strcmp(BOM.c_str(), BOMHeader) == 0) {
// We have BOM header, so load it.
TRESource->Lines->LoadFromStream(JSMemoryStream);
} else {
// We need the BOM header, so add it.
try {
JSBOM_MemoryStream = new TMemoryStream;
JSBOM_MemoryStream->Write(BOM.c_str(), BOM.Length());
JSBOM_MemoryStream->Seek(0,soFromEnd);
JSBOM_MemoryStream->CopyFrom(JSMemoryStream, 0);
JSBOM_MemoryStream->Seek(0, soFromBeginning);
TRESource->Lines->LoadFromStream(JSBOM_MemoryStream);
}
__finally
{
delete JSBOM_MemoryStream;
}
}
}
__finally
{
delete JSMemoryStream;
}
}
我写处理后的文件时,就是这样写的。 (注:TREProcessed声明为TTntRichEdit *TREProcessed;又:AnsiString outputFileName;)
ofstream SaveFile(outputFileName.c_str());
TREProcessed->PlainText = true;
SaveFile << "737"; // Add UTF8 BOM [0xEF, 0xBB, 0xBF]
for (int i = 0, max = TREProcessed->Lines->Count; i < max; i++) {
SaveFile << UTF8Encode(TREProcessed->Lines->Strings[i]).c_str();
if (i < max - 1) {
SaveFile << UTF8Encode(_WS "\n").c_str();
}
}
SaveFile.close();