为什么 R 数据集比从 R 写入但从 Stata 读取的 Stata 数据集占用更多内存

Why R dataset takes more memory than Stata dataset written from R but read from Stata

考虑以下 R 数据集。

object.size(mtcars)
6736 bytes

#writing this object as rds

write.rds(mtcar,"mt.rds") 

#properties of the file shows it as 1.218 KB
#reading back rds file

dataRDS<-read.rds("mt.rds")
object.size(dataRDS)
6736 bytes  #this is the same as original mtcars (not surprising)

#writing this object as Stata data

write.dta(mtcars,"mt.dta") 
#clicking the properties of file shows the size as 4.5 KB 
#reading back Stata data in R

dataDTA<-read.dta("mt.dta")
object.size(dataDTA)
8656 bytes 

# this is larger than the original file size

#reading Stata data from Stata gives the size as 2.82 KB


 obs:            32                          Written by R.              
 vars:            11                          
 size:         2,816 

为什么从 R 读取默认 R 对象比读取从 R 转换为 Stata 数据的 Stata 中的相同数据集占用更多内存?

大部分好像是attributes的大小不同,可以看出它们的存储方式不同。比较大小,

> object.size(attributes(dataDTA)) - object.size(attributes(dataRDS))
1696 bytes

> object.size(dataDTA) - object.size(dataRDS)
1920 bytes

差异可能是由于 object.size 是对真实尺寸的估计。