文件编码转换的COBOL程序
COBOL program for file encoding conversion
我需要将文本文件从 utf8 转换为 cp1251。而且我不能使用任何第三方软件。是否有任何用 COBOL 编写的例程? Windows.
上的 Micro Focus Cobol
答案:有很多为此编写的 COBOL 例程...
我不知道有任何免费(=开放源代码,可以自由实际使用它)实现,但您可以轻松地自己编写。
只需通过源并将其移动到目标,如果标志在 cp1251 中不可用,请使用“?”管他呢。
这里唯一的工作是:你需要从 x'80' 及以上查找 128 个字符...
或者你看看MF有没有特定的扩展或者你自己写。
SO 没有 "please code this for me",所以你应该展示你已经尝试过的东西。
为了让您了解一下 this javascript sample 的转换,应该类似于(未经测试的代码):
77 utf-8-field PIC X(5000).
77 new-char PIC X.
77 cp1251-field PIC X(5000).
77 utf-8-pos PIC 9(04) COMP-5.
77 cp1251-pos PIC 9(04) COMP-5.
77 utf-8-end PIC 9(04) COMP-5.
MOVE FUNCTION LENGTH ( FUNCTION TRIM (utf-8-field TRAILING) )
TO utf-8-end
MOVE 1 TO cp1251-pos
PERFORM VARYING utf-8-pos FROM 1 BY 1
UNTIL utf-8-pos = utf-8-end
EVALUATE TRUE
*> normal ASCII character
WHEN utf-8-field (utf-8-pos) < x'80'
MOVE utf-8-field (utf-8-pos) TO new-char
*> UTF-8 in CP1251 range
WHEN utf-8-field (utf-8-pos) < x'04'
*> skip the first byte
ADD 1 TO utf-8-pos
EVALUATE TRUE
WHEN utf-8-pos > utf-8-end
MOVE '?' TO new-char
WHEN utf-8-field (utf-8-pos) = x'51'
MOVE x'B8' TO new-char
WHEN utf-8-field (utf-8-pos) >= x'4F'
MOVE '?' TO new-char
*> alternative: use alphabet conversion here
WHEN utf-8-field (utf-8-pos) = x'01'
MOVE x'A8' TO new-char
WHEN OTHER
MOVE utf-8-field (utf-8-pos) TO new-char
INSPECT new-char CONVERTING x'0203 ...
TO x'B2B2 ...
END-EVALUATE
*> UTF-8 with no CP1251 char
*> Todo: check for other multibyte headers and add the correct
*> number of characters to utf-8-pos
*> WHEN ...
WHEN OTHER
MOVE '?' TO new-char
END-EVALUATE
STRING new-char
DELIMITED BY SIZE
INTO cp1251-field
WITH POINTER cp1251-pos
END-STRING
END-PERFORM
您可能想为 CONVERTING x'0203 ... TO x'B2B3 ...
部分定义一个 ALPHABET
:
SPECIAL-NAMES.
ALPHABET UTF8-PART-2 IS x'01', x'02' THRU x'4F', x'51'.
ALPHABET CP1251 IS x'A8', x'B2' THRU x'FF', x'B8'.
并在内部 EVALUATE
使用
MOVE utf-8-field (utf-8-pos) TO new-char
INSPECT new-char CONVERTING UTF8-PART-2 TO CP1251
你看过@CBL_STRING_CONVERT了吗?
我需要将文本文件从 utf8 转换为 cp1251。而且我不能使用任何第三方软件。是否有任何用 COBOL 编写的例程? Windows.
上的 Micro Focus Cobol答案:有很多为此编写的 COBOL 例程...
我不知道有任何免费(=开放源代码,可以自由实际使用它)实现,但您可以轻松地自己编写。 只需通过源并将其移动到目标,如果标志在 cp1251 中不可用,请使用“?”管他呢。 这里唯一的工作是:你需要从 x'80' 及以上查找 128 个字符...
或者你看看MF有没有特定的扩展或者你自己写。 SO 没有 "please code this for me",所以你应该展示你已经尝试过的东西。
为了让您了解一下 this javascript sample 的转换,应该类似于(未经测试的代码):
77 utf-8-field PIC X(5000).
77 new-char PIC X.
77 cp1251-field PIC X(5000).
77 utf-8-pos PIC 9(04) COMP-5.
77 cp1251-pos PIC 9(04) COMP-5.
77 utf-8-end PIC 9(04) COMP-5.
MOVE FUNCTION LENGTH ( FUNCTION TRIM (utf-8-field TRAILING) )
TO utf-8-end
MOVE 1 TO cp1251-pos
PERFORM VARYING utf-8-pos FROM 1 BY 1
UNTIL utf-8-pos = utf-8-end
EVALUATE TRUE
*> normal ASCII character
WHEN utf-8-field (utf-8-pos) < x'80'
MOVE utf-8-field (utf-8-pos) TO new-char
*> UTF-8 in CP1251 range
WHEN utf-8-field (utf-8-pos) < x'04'
*> skip the first byte
ADD 1 TO utf-8-pos
EVALUATE TRUE
WHEN utf-8-pos > utf-8-end
MOVE '?' TO new-char
WHEN utf-8-field (utf-8-pos) = x'51'
MOVE x'B8' TO new-char
WHEN utf-8-field (utf-8-pos) >= x'4F'
MOVE '?' TO new-char
*> alternative: use alphabet conversion here
WHEN utf-8-field (utf-8-pos) = x'01'
MOVE x'A8' TO new-char
WHEN OTHER
MOVE utf-8-field (utf-8-pos) TO new-char
INSPECT new-char CONVERTING x'0203 ...
TO x'B2B2 ...
END-EVALUATE
*> UTF-8 with no CP1251 char
*> Todo: check for other multibyte headers and add the correct
*> number of characters to utf-8-pos
*> WHEN ...
WHEN OTHER
MOVE '?' TO new-char
END-EVALUATE
STRING new-char
DELIMITED BY SIZE
INTO cp1251-field
WITH POINTER cp1251-pos
END-STRING
END-PERFORM
您可能想为 CONVERTING x'0203 ... TO x'B2B3 ...
部分定义一个 ALPHABET
:
SPECIAL-NAMES.
ALPHABET UTF8-PART-2 IS x'01', x'02' THRU x'4F', x'51'.
ALPHABET CP1251 IS x'A8', x'B2' THRU x'FF', x'B8'.
并在内部 EVALUATE
使用
MOVE utf-8-field (utf-8-pos) TO new-char
INSPECT new-char CONVERTING UTF8-PART-2 TO CP1251
你看过@CBL_STRING_CONVERT了吗?