为什么 R 数据集比从 R 写入但从 Stata 读取的 Stata 数据集占用更多内存
Why R dataset takes more memory than Stata dataset written from R but read from Stata
考虑以下 R 数据集。
object.size(mtcars)
6736 bytes
#writing this object as rds
write.rds(mtcar,"mt.rds")
#properties of the file shows it as 1.218 KB
#reading back rds file
dataRDS<-read.rds("mt.rds")
object.size(dataRDS)
6736 bytes #this is the same as original mtcars (not surprising)
#writing this object as Stata data
write.dta(mtcars,"mt.dta")
#clicking the properties of file shows the size as 4.5 KB
#reading back Stata data in R
dataDTA<-read.dta("mt.dta")
object.size(dataDTA)
8656 bytes
# this is larger than the original file size
#reading Stata data from Stata gives the size as 2.82 KB
obs: 32 Written by R.
vars: 11
size: 2,816
为什么从 R 读取默认 R 对象比读取从 R 转换为 Stata 数据的 Stata 中的相同数据集占用更多内存?
大部分好像是attributes
的大小不同,可以看出它们的存储方式不同。比较大小,
> object.size(attributes(dataDTA)) - object.size(attributes(dataRDS))
1696 bytes
> object.size(dataDTA) - object.size(dataRDS)
1920 bytes
差异可能是由于 object.size
是对真实尺寸的估计。
考虑以下 R 数据集。
object.size(mtcars)
6736 bytes
#writing this object as rds
write.rds(mtcar,"mt.rds")
#properties of the file shows it as 1.218 KB
#reading back rds file
dataRDS<-read.rds("mt.rds")
object.size(dataRDS)
6736 bytes #this is the same as original mtcars (not surprising)
#writing this object as Stata data
write.dta(mtcars,"mt.dta")
#clicking the properties of file shows the size as 4.5 KB
#reading back Stata data in R
dataDTA<-read.dta("mt.dta")
object.size(dataDTA)
8656 bytes
# this is larger than the original file size
#reading Stata data from Stata gives the size as 2.82 KB
obs: 32 Written by R.
vars: 11
size: 2,816
为什么从 R 读取默认 R 对象比读取从 R 转换为 Stata 数据的 Stata 中的相同数据集占用更多内存?
大部分好像是attributes
的大小不同,可以看出它们的存储方式不同。比较大小,
> object.size(attributes(dataDTA)) - object.size(attributes(dataRDS))
1696 bytes
> object.size(dataDTA) - object.size(dataRDS)
1920 bytes
差异可能是由于 object.size
是对真实尺寸的估计。