用 C# 中的 "Unit Separator" (0x1f) 替换平面文件中的制表符 ("\t")

Question

我一直无法找到 'Unit Separator' 的元字符来替换平面文件中的制表符。

到目前为止我有这个：

File.WriteAllLines(outputFile,
    File.ReadLines(inputFile)
    .Select(t => t.Replace("\t", "[=11=]x1f")));  //this does not work

我也试过：

File.WriteAllLines(outputFile,
    File.ReadLines(inputFile)
    .Select(t => t.Replace("\t", "\u"))); //also doesn't work

和

File.WriteAllLines(outputFile,
    File.ReadLines(inputFile)
    .Select(t => t.Replace("\t", 0x1f)));  //also doesn't work

如何正确使用十六进制作为参数？另外，“单位分隔符”的元字符是什么？

Answer 1

单位分隔符的元字符是

U+001f

你应该可以像这样使用它

File.WriteAllLines(outputFile,
File.ReadLines(inputFile)
.Select(t => t.Replace("\t", "\u001f")));

编辑：由于开始讨论控制字符，为了后代的缘故，我将添加此定义。

A special, non-printing character that begins, modifies, or ends a function, event, operation or control operation. The ASCII character set defines 32 control characters. Originally, these codes were designed to control teletype machines. Now, however, they are often used to control display monitors, printers, and other modern devices.

来自 here.

还有，这里是单位分隔符的说明

The smallest data items to be stored in a database are called units in the ASCII definition. We would call them field now. The unit separator separates these fields in a serial data storage environment. Most current database implementations require that fields of most types have a fixed length. Enough space in the record is allocated to store the largest possible member of each field, even if this is not necessary in most cases. This costs a large amount of space in many situations. The US control code allows all fields to have a variable length. If data storage space is limited—as in the sixties—this is a good way to preserve valuable space. On the other hand is serial storage far less efficient than the table driven RAM and disk implementations of modern times. I can't imagine a situation where modern SQL databases are run with the data stored on paper tape or magnetic reels...

来自 here。

Answer 2

我认为在 C# 中编码 unicode 字符的正确方法是使用 \unnnn 格式。您可以尝试将其替换为字符串 \u001f，如下所示：

File.WriteAllLines(outputFile,
    File.ReadLines(inputFile)
    .Select(t => t.Replace("\t", "[=10=]1f")));

这样行吗？

Answer 3

这应该可以让您到达目的地：

        char unitSeperatorChar = (char)Convert.ToInt32("0x1f", 16);
        string contents = File.ReadAllText(inputFile);
        string convertedContents = contents.Replace('\t', unitSeperatorChar);
        File.WriteAllText(outputFile, convertedContents);

我加载成一个字符串，转换，然后重新保存。您可以将它们组合起来以提高字符串管理中的内存效率。

用 C# 中的 "Unit Separator" (0x1f) 替换平面文件中的制表符 ("\t")

Replace tabs ("\t") in flat file with "Unit Separator" (0x1f) in C#

c#

hex

flat-file

metacharacters