如何正确解析 R 中的字节流?
How can correctly parse a byte stream in R?
我正在访问 API,其中 returns 一长串原始字节。
我的 Q 不适合 API 本身的简单代表,但这是我最好的镜头:
raw_bytes <-
as.raw(c("0x43","0xb7","0x01","0x48","0x43","0xb7","0x01","0x48","0x43","0xb7","0x01","0x48","0x43","0xb7","0x01","0x48","0x3f","0x80","0x00","0x00","0x00","0x00","0x01","0x5e","0xa9","0x3e","0x83","0x80"))
> str(raw_bytes)
raw [1:28] 43 b7 01 48 ...
现在,从 API 文档中,我知道这个 28 字节的块将按如下方式解析,"big" endian-ness:
字节类型
4 浮动
4 浮动
4 浮动
4 浮动
4 浮动
8 长整数(这是一个日期对象,定义为从 1970 年 1 月 1 日开始的毫秒数)
writeBin(raw_bytes, "myfile.txt")
con <- file("myfile.txt", "rb") # create connection object; specify raw binary
> readBin(con, "double", size = 4, n = 5, endian = "big") # get those first 5 objects from the chunk
[1] 366.00 366.00 365.75 366.00 10.70
到目前为止一切顺利;这些与我的预期一致。
> readBin(con, "integer", size = 8, n = 1, endian = "big") # get the last 8 byte chunk
[1] -1453180896
嗯,看起来不对。在线 8 字节十六进制转换器建议正确的十进制值为 1506080340000,这与我期望的日期相匹配(2017 年 9 月 22 日)
仔细看看最后 8 个字节:
> (con2 <- tail(raw_bytes, 8))
[1] 00 00 01 5e a9 62 38 20
并尝试对 readBin() 进行一些不同的尝试:
> readBin(con2, "double", size = 8, n = 1, endian = "big")
[1] 7.441026e-312
> readBin(con2, "numeric", size = 8, n = 1, endian = "little")
[1] 1.818746e-153
> readBin(con2, "integer", size = 8, n = 1, endian = "little")
[1] 1577123840
没有。
我可以使用外部库从这些字节中生成预期的十进制数:
str <- paste(con2, collapse = "")
> bit64::as.integer64(as.numeric(paste0("0x",str)))
integer64
[1] 1506080340000
无论如何,这是我的问题:有没有办法使用 base R 正确解析我的比特流,尤其是 readBin()?
而且,更一般地说,是否有关于如何在 R 会话中解析字节流的自以为是的方式?
您可以使用类似问题的答案:
。它实际上还尝试读取日期。
一个更 hacky 的答案是这样的:
library( bit64 )
con <- file("myfile.txt", "rb")
readBin(con, "double", size = 4, n = 5, endian = "big")
a = readBin(con, "double", size = 8, n = 1, endian = "big")
class(a) = "integer64"
a
# 1506078000000
呸!
或者:
library( bit64 )
con <- file("myfile.txt", "rb")
readBin(con, "double", size = 4, n = 5, endian = "big")
sum( as.integer64( readBin(con,"integer",size=2,n=4,endian="big",signed=F) ) *
as.integer64(65536)^(3:0) )
我正在访问 API,其中 returns 一长串原始字节。
我的 Q 不适合 API 本身的简单代表,但这是我最好的镜头:
raw_bytes <-
as.raw(c("0x43","0xb7","0x01","0x48","0x43","0xb7","0x01","0x48","0x43","0xb7","0x01","0x48","0x43","0xb7","0x01","0x48","0x3f","0x80","0x00","0x00","0x00","0x00","0x01","0x5e","0xa9","0x3e","0x83","0x80"))
> str(raw_bytes)
raw [1:28] 43 b7 01 48 ...
现在,从 API 文档中,我知道这个 28 字节的块将按如下方式解析,"big" endian-ness:
字节类型
4 浮动
4 浮动
4 浮动
4 浮动
4 浮动
8 长整数(这是一个日期对象,定义为从 1970 年 1 月 1 日开始的毫秒数)
writeBin(raw_bytes, "myfile.txt")
con <- file("myfile.txt", "rb") # create connection object; specify raw binary
> readBin(con, "double", size = 4, n = 5, endian = "big") # get those first 5 objects from the chunk
[1] 366.00 366.00 365.75 366.00 10.70
到目前为止一切顺利;这些与我的预期一致。
> readBin(con, "integer", size = 8, n = 1, endian = "big") # get the last 8 byte chunk
[1] -1453180896
嗯,看起来不对。在线 8 字节十六进制转换器建议正确的十进制值为 1506080340000,这与我期望的日期相匹配(2017 年 9 月 22 日)
仔细看看最后 8 个字节:
> (con2 <- tail(raw_bytes, 8))
[1] 00 00 01 5e a9 62 38 20
并尝试对 readBin() 进行一些不同的尝试:
> readBin(con2, "double", size = 8, n = 1, endian = "big")
[1] 7.441026e-312
> readBin(con2, "numeric", size = 8, n = 1, endian = "little")
[1] 1.818746e-153
> readBin(con2, "integer", size = 8, n = 1, endian = "little")
[1] 1577123840
没有。
我可以使用外部库从这些字节中生成预期的十进制数:
str <- paste(con2, collapse = "")
> bit64::as.integer64(as.numeric(paste0("0x",str)))
integer64
[1] 1506080340000
无论如何,这是我的问题:有没有办法使用 base R 正确解析我的比特流,尤其是 readBin()?
而且,更一般地说,是否有关于如何在 R 会话中解析字节流的自以为是的方式?
您可以使用类似问题的答案:
一个更 hacky 的答案是这样的:
library( bit64 )
con <- file("myfile.txt", "rb")
readBin(con, "double", size = 4, n = 5, endian = "big")
a = readBin(con, "double", size = 8, n = 1, endian = "big")
class(a) = "integer64"
a
# 1506078000000
呸! 或者:
library( bit64 )
con <- file("myfile.txt", "rb")
readBin(con, "double", size = 4, n = 5, endian = "big")
sum( as.integer64( readBin(con,"integer",size=2,n=4,endian="big",signed=F) ) *
as.integer64(65536)^(3:0) )