在 .Net 中解压 Comp-3 时遇到问题。 Comp-3 值中除符号字符外还有字母字符

Having trouble unpacking Comp-3 in .Net. There are letter characters aside from sign character inside Comp-3 value

我正在尝试使用 .NET 将大型机 EDI 文件导入回 SQL 服务器,但在解压缩某些 comp-3 字段时遇到问题。

此文件来自我们的一位客户,我有以下字段的 Copy Book 布局:

05  EH-GROSS-INVOICE-AMT            PIC S9(07)V9999  COMP-3.         
05  EH-CASH-DISCOUNT-AMT            PIC S9(07)V9999  COMP-3.         
05  EH-CASH-DISCOUNT-PCT            PIC S9(03)V9999  COMP-3.

我将只关注这 3 个字段,因为所有其他字段都是 PIC(X) 并且已经是 Unicode 值。我借助由 Max Vagner 创建的工具 Ebcdic2Ascii 加载了所有内容。我只是对“Unpack”功能做了一些修改,并将其修改为

private string Unpack(byte[] packedBytes, int decimalPlaces, out bool isParsedSuccessfully)
{
    isParsedSuccessfully = true;
    return BitConverter.ToString(packedBytes);
}

为了让我得到以下示例数据:

EH-GROSS-INVOICE-AMT     EH-CASH-DISCOUNT-AMT     EH-CASH-DISCOUNT-PCT
----------------------------------------------------------------------
00-1A-1A-03-26-0C        00-00-00-00-00-0C        00-00-00-0C
00-0A-1A-1A-00-0C        00-00-1A-1A-2D-0C        00-1A-00-0C
00-09-10-20-00-0C        00-00-10-1A-1A-0C        00-1A-00-0C

这是我根据对 Comp-3 值的理解为解包这些值创建的示例代码:

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            var result1 = UnpackMod("00-1A-1A-03-26-0C", 4);
            var result2 = UnpackMod("00-00-00-00-00-0C", 4);
            var result3 = UnpackMod("00-00-00-0C", 4);

            Console.WriteLine($"{result1}\n{result2}\n{result3}\n");

            var result4 = UnpackMod("00-0A-1A-1A-00-0C", 4);
            var result5 = UnpackMod("00-00-1A-1A-2D-0C", 4);
            var result6 = UnpackMod("00-1A-00-0C", 4);

            Console.WriteLine($"{result4}\n{result5}\n{result6}\n");

            var result7 = UnpackMod("00-09-10-20-00-0C", 4);
            var result8 = UnpackMod("00-00-10-1A-1A-0C", 4);
            var result9 = UnpackMod("00-1A-00-0C", 4);

            Console.WriteLine($"{result7}\n{result8}\n{result9}");

            Console.ReadLine();
        }

        /// <summary>
        /// Method for unpacking Comp-3 fields.
        /// </summary>
        /// <param name="hexString"></param>
        /// <param name="decimalPlaces"></param>
        /// <returns>Returns numeric string if parse was successful; else Return input hex string</returns>
        private static string UnpackMod(string inputString, int decimalPlaces)
        {
            var outputString = inputString;

            // Remove "-".
            outputString = outputString.Replace("-", "");

            // Check last character for sign.
            string lastChar = outputString.Substring(outputString.Length - 1, 1);
            bool isNegative = (lastChar == "D" || lastChar == "B");

            // Remove sign character.
            if (lastChar == "C" || lastChar == "A" || lastChar == "E" || lastChar == "F" || lastChar == "D" || lastChar == "B")
            {
                outputString = outputString.Substring(0, outputString.Length - 1);
            }

            // Place decimal point.
            outputString = outputString.Insert(outputString.Length - decimalPlaces, ".");

            // Check if parsed value is numeric. This will also eliminate all leading 0.
            var isParsedSuccessfully = decimal.TryParse(outputString, out decimal decimalValue);

            // If isParsedSuccessfully is true then return numeric string else return inputString..
            string result = "NULL";
            if (isParsedSuccessfully)
            {
                // Convert value to negative.
                if (isNegative)
                {
                    decimalValue = decimalValue * -1;
                }

                result = decimalValue.ToString();
            }

            return result;
        }
    }
}

在 运行 示例代码之后,我得到了以下结果:

EH-GROSS-INVOICE-AMT     EH-CASH-DISCOUNT-AMT     EH-CASH-DISCOUNT-PCT
----------------------------------------------------------------------
NULL                     0.0000                   0.0000
NULL                     NULL                     NULL
9102.0000                NULL                     NULL        

如您所见,我只能正确获取以下 3 个值:

00-09-10-20-00-0C -> 9102.0000
00-00-00-00-00-0C -> 0.0000
00-00-00-0C       -> 0.0000

据此来源引用:http://www.3480-3590-data-conversion.com/article-packed-fields.html。我对Comp-3的理解如下:

COBOL Comp-3 is a binary field type that puts ("packs") two digits into each byte, using a notation called Binary Coded Decimal, or BCD.

The Binary Coded Decimal (BCD) data type is just as its name suggests -- it is a value stored in decimal (base ten) notation, and each digit is binary coded. Since a digit only has ten possible values (0-9).

The low nibble of the least significant byte is used to store the sign for the number. This nibble stores only the sign, not a digit. "C" hex is positive, "D" hex is negative, and "F" hex is unsigned.

因为我知道 BCD 应该只有 0-9 的值,并且最后应该只有一个字符,可以是“C”、“D”或“F”。我不知道如何解压以下值:

00-1A-1A-03-26-0C
00-0A-1A-1A-00-0C        
00-00-1A-1A-2D-0C
00-1A-00-0C
00-00-10-1A-1A-0C
00-1A-00-0C

这些值在符号字符旁边还有其他字符。我有一种感觉,数据已经被转换,因为如果没有,那么除非您应用编码,否则那里应该没有可读的值。我仍然不确定这一点,并且希望对此有任何见解。谢谢。

首先,PIC X 不是 COBOL 中的 Unicode。

引用自 here...

It is common for mainframe data to include both text and binary data in a single record, for example a name, a currency amount, and a quantity:

Hopper Grace ar% .

...which would be...

x'C8969797859940404040C799818385404040404081996C004B'

...in hex. This is code page 37, commonly referred to as EBCDIC.

[...]Converting to code page 1250, commonly in use on Microsoft Windows, you would end up with...

x'486F707065722020202047726163652020202020617225002E'

...where the text data is translated but the packed data is destroyed. The packed data no longer has a valid sign in the last nibble (the lower half of the last byte), the currency amount itself has been changed as has the quantity (from decimal 75 to decimal 11,776 due to both code page conversion and mangling of a big endian number as a little endian number).

您的数据很可能是在从大型机传输时转换的代码页。如果您知道原始代码页和它被转换成的代码页,那么您可能能够解读打包数据。

我说 可能 因为,如果幸运的话,您拥有的十六进制值将与原始代码页中的十六进制值一对一映射。请注意,EBCDIC x'15' 和 x'0D' 都映射到 ASCII x'0D' 很常见。