Python: 读取字节 0xe0 时出现意外结果

Question

import msvcrt
while True:
    try:
        a=msvcrt.getch()
        a=a.decode('utf-8')
        print(a)
    except:
        print(a)

当我输入箭头键或页面 up/page down/delete 等时，上面的代码产生了意想不到的结果

The output is as follows:
[I/P=a]
a #expected result
[I/P=UP ARROW]
b'\xe0'
H  #unexpected result

我能理解b'\xe0' 被打印出来了，但是为什么H 被打印出来了？当我这样做时 H 没有被打印出来：

import msvcrt
a=msvcrt.getch()
print(a)#b'\xe0'
a=a.decode('utf-8')
print(a)
When I input UP ARROW here, it raises a UNICODEDECODERROR.

我查看了另一个解释 msvcrt.getch() 工作原理的问题，但这仍然无法解释为什么我在第一段代码中得到两个字符而在第二段代码中只有一个字符的代码。 为什么 a 不等待输入下一个字符，而是假定值 b'H'？

Answer 1

箭头键（以及功能键和其他键）需要 两次单独调用 到 msvcrt.getch。当您按 ↑ 时，第一个 returns b'\xe0' 和第二个 returns b\x48。 这些都不是 UTF-8 甚至 ASCII。第一个不是有效的 UTF-8 序列，这就是您的 decode('utf-8') 调用抛出异常的原因。第二个是表示键码 72 的字节值，巧合的是恰好与 UTF-8 或 ASCII 中表示字母 'H' 的字节值相同。

来自 msvcrt documentation（强调我的）：

msvcrt.getch()

Read a keypress and return the resulting character as a byte string. Nothing is echoed to the console. This call will block if a keypress is not already available, but will not wait for Enter to be pressed. If the pressed key was a special function key, this will return '[=17=]0' or '\xe0'; the next call will return the keycode. The Control-C keypress cannot be read with this function.

您可以使用如下程序查看字节数：

import msvcrt

NEXT_CHARACTER_IS_KEYCODE = [b'0xe0', b'0x00']

while True:
  ch1 = msvcrt.getch()
  print("Main getch(): {}".format(ch1))
  if ch1 in NEXT_CHARACTER_IS_KEYCODE:
      ch2 = msvcrt.getch()
      print("  keycode getch(): {}".format(ch2))

请注意，那里没有 .decode('utf-8')，因为 getch 没有 return UTF-8 字节。

（注意：确保你真的想使用 msvcrt.getch，因为这是一个非常不寻常的选择，尤其是在 2019 年。）

Python: 读取字节 0xe0 时出现意外结果

Python: Unexpected result when byte 0xe0 is read

python

encoding

msvcrt

`msvcrt.getch()`