打开文件前检测编码
Detecting encoding before opening a file
我得到一个字符编码未知的文件。 运行 file -bi test.trace
returns text/plain; charset=us-ascii
但使用
(with-open-file (stream "/home/*/test.trace" :external-format :us-ascii)
(code-to-work-with-file))
给我一个例外:
:ASCII stream decoding error on
#<SB-SYS:FD-STREAM for "file /home/*/test.trace" {10208D2723}>:
the octet sequence #(194) cannot be decoded. [Condition of type SB-INT:STREAM-DECODING-ERROR]
如何在打开文件之前检测文件的编码?
我可以用 emacs
、less
和 nano
打开文件,所以这似乎是编码的错误检测或 [=17] 的差异=] 和 sbcl
认为编码应该是这样的。
我目前通过强制每个文件使用 vim +set nobomb | set fenc=utf8| x file-path
的 utf8 编码来避免这个问题。但即使在这之后 file
仍然认为它是 us-ascii 编码。另外,这不是一个有效的永久解决方案,而是一个让它工作的肮脏黑客。
正如 here、
中的 prorgammers stackexchange 所指出的
Files generally indicate their encoding with a file header. There are
many examples here. However, even reading the header you can never be
sure what encoding a file is really using.
我在我的系统中寻找跟踪文件并找到了这个,但这没有任何有趣的东西
2016-06-22 13:10:07☆|ruby-2.2.3@laguna| ~/learn/lisp/Whosebug/scripts 中的 Antonios-MacBook-Pro
○ → 文件 -I resources/hello.trace
resources/hello.trace: text/plain; charset=us-ascii
2016-06-22 13:11:50☆|ruby-2.2.3@laguna| ~/learn/lisp/Whosebug/scripts 中的 Antonios-MacBook-Pro
○ → 猫 resources/hello.trace
打印! { "Hello, World!" }
打印! { 连续! ( "Hello, World!" , "\n" ) }
所以有了这段代码我可以阅读它:
CL-USER> (with-open-file (in "/Users/toni/learn/lisp/Whosebug/scripts/resources/hello.trace" :external-format :us-ascii)
(when in
(loop for line = (read-line in nil)
while line do (format t "~a~%" line))))
println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
NIL
甚至用中文或其他什么:
我们可以这样读取ascci字符
CL-USER> (format nil "~{~C~}" (mapcar #'code-char '(194)))
"Â"
或任何其他奇怪的字符,所以看起来可以是带有重音符号的字符我将其添加到文件中:
println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
Â
patatopita
我得到同样的错误:
:
上的 ASCII 流解码错误
对于"file /Users/toni/learn/lisp/Whosebug/scripts/resources/hello.trace"
{1003994043}>:
无法解码八位字节序列#(195)。
[类型 SB-INT 的条件:STREAM-DECODING-ERROR]
所以此时你可以使用条件和重新启动来更改字符,有一个选项,我不是这种代码的专家,但可以重新启动
Restarts:
0: [ATTEMPT-RESYNC] Attempt to resync the stream at a character boundary and continue.
1: [FORCE-END-OF-FILE] Force an end of file.
2: [INPUT-REPLACEMENT] Use string as replacement input, attempt to resync at a character boundary and continue.
3: [*ABORT] Return to SLIME's top level.
4: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {10050E0003}>)
输入替换,如果不是,请尝试使用 latin-1 或 ISO 等欧洲语言....
CL-USER> (with-open-file (in "/Users/toni/learn/lisp/Whosebug/scripts/resources/hello.trace" :external-format :latin-1)
(when in
(loop for line = (read-line in nil)
while line do (format t "~a~%" line))))
println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
¬
patatopita
NIL
应该可以,祝你好运
所以让我们用欧洲字符集阅读
我得到一个字符编码未知的文件。 运行 file -bi test.trace
returns text/plain; charset=us-ascii
但使用
(with-open-file (stream "/home/*/test.trace" :external-format :us-ascii)
(code-to-work-with-file))
给我一个例外:
:ASCII stream decoding error on
#<SB-SYS:FD-STREAM for "file /home/*/test.trace" {10208D2723}>:
the octet sequence #(194) cannot be decoded. [Condition of type SB-INT:STREAM-DECODING-ERROR]
如何在打开文件之前检测文件的编码?
我可以用 emacs
、less
和 nano
打开文件,所以这似乎是编码的错误检测或 [=17] 的差异=] 和 sbcl
认为编码应该是这样的。
我目前通过强制每个文件使用 vim +set nobomb | set fenc=utf8| x file-path
的 utf8 编码来避免这个问题。但即使在这之后 file
仍然认为它是 us-ascii 编码。另外,这不是一个有效的永久解决方案,而是一个让它工作的肮脏黑客。
正如 here、
中的 prorgammers stackexchange 所指出的Files generally indicate their encoding with a file header. There are many examples here. However, even reading the header you can never be sure what encoding a file is really using.
我在我的系统中寻找跟踪文件并找到了这个,但这没有任何有趣的东西
2016-06-22 13:10:07☆|ruby-2.2.3@laguna| ~/learn/lisp/Whosebug/scripts 中的 Antonios-MacBook-Pro ○ → 文件 -I resources/hello.trace resources/hello.trace: text/plain; charset=us-ascii
2016-06-22 13:11:50☆|ruby-2.2.3@laguna| ~/learn/lisp/Whosebug/scripts 中的 Antonios-MacBook-Pro ○ → 猫 resources/hello.trace 打印! { "Hello, World!" } 打印! { 连续! ( "Hello, World!" , "\n" ) }
所以有了这段代码我可以阅读它:
CL-USER> (with-open-file (in "/Users/toni/learn/lisp/Whosebug/scripts/resources/hello.trace" :external-format :us-ascii)
(when in
(loop for line = (read-line in nil)
while line do (format t "~a~%" line))))
println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
NIL
甚至用中文或其他什么:
我们可以这样读取ascci字符
CL-USER> (format nil "~{~C~}" (mapcar #'code-char '(194)))
"Â"
或任何其他奇怪的字符,所以看起来可以是带有重音符号的字符我将其添加到文件中:
println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
Â
patatopita
我得到同样的错误:
:
上的 ASCII 流解码错误对于"file /Users/toni/learn/lisp/Whosebug/scripts/resources/hello.trace" {1003994043}>:
无法解码八位字节序列#(195)。 [类型 SB-INT 的条件:STREAM-DECODING-ERROR]
所以此时你可以使用条件和重新启动来更改字符,有一个选项,我不是这种代码的专家,但可以重新启动
Restarts:
0: [ATTEMPT-RESYNC] Attempt to resync the stream at a character boundary and continue.
1: [FORCE-END-OF-FILE] Force an end of file.
2: [INPUT-REPLACEMENT] Use string as replacement input, attempt to resync at a character boundary and continue.
3: [*ABORT] Return to SLIME's top level.
4: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {10050E0003}>)
输入替换,如果不是,请尝试使用 latin-1 或 ISO 等欧洲语言....
CL-USER> (with-open-file (in "/Users/toni/learn/lisp/Whosebug/scripts/resources/hello.trace" :external-format :latin-1)
(when in
(loop for line = (read-line in nil)
while line do (format t "~a~%" line))))
println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
¬
patatopita
NIL
应该可以,祝你好运
所以让我们用欧洲字符集阅读