write_dta 中的错误：提供的字符串值比指定列的可用存储大小长

Question

我正在尝试将我的数据 table 从 R Studio 导出为 dta 格式。我在 R 中使用 haven 库中的 write_dta 函数并得到以下错误：

提供的字符串值比指定列的可用存储大小长。

我对 R 和 Stata 很陌生，不明白这意味着什么，我应该怎么做。

Answer 1

听起来您的 data.frame 中有一段很长的文字。 write_dta 在处理长字符串 (https://github.com/tidyverse/haven/issues/437) 时存在已知问题。您可以 trim 您 data.frame 中的字符串，如下所示：

df = as.data.frame(apply(YOUR_DATA, 2, function(x){
     if(class(x) == 'character') substr(x, 1, 128) else x}))

然后尝试write_dta(df)。 128 个字符的最大长度应该是安全的，但较新版本的 Stata 可以处理更多。

Answer 2

我注意到使用 data.frame 解决方案可能会丢失标签。 tibble 允许保留标签（例如，导入的带有来自调查 collection 平台标签的 *.sav 文件）。

这是一个 tidyverse 解决方案，它使用 haven 来读写并保留标签。请记住，您的初始 df 也需要是小标题。

library(tidyverse)

df <- haven::read_sav("YOUR FILE.sav")   # could also be some other file format that you start with as a tibble

df <- df %>%
  mutate(across(where(is.character), ~ substr(., 1, 2045)))

haven::write_dta(df, "NAME OF NEW FILE.dta")

对我来说，适用于 write_dta(df) 的最大字符串长度是 2045。

write_dta 中的错误：提供的字符串值比指定列的可用存储大小长

Error in write_dta : A provided string value was longer than the available storage size of the specified column

datatable

r

stata

dta