使用 tbl_summary 创建带标签的汇总统计
Using tbl_summary to create summary statistics with labels
我已将 Stata (dta) 文件读入 R,数据片段如下所示:
short
# A tibble: 200 x 5
q4_1 q4_2 q4_3 q4_4 treatment_cur
<dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <chr>
1 NA(z) NA(z) NA(z) NA(z) Control
2 NA(z) NA(z) NA(z) NA(z) Control
3 1 [1.Yes] 0 [0.No] 0 [0.No] 1 [1.Yes] Treatment
4 0 [0.No] 0 [0.No] 1 [1.Yes] 0 [0.No] Control
5 0 [0.No] 0 [0.No] 0 [0.No] 1 [1.Yes] Control
6 NA(z) NA(z) NA(z) NA(z) Control
7 1 [1.Yes] 1 [1.Yes] 1 [1.Yes] 1 [1.Yes] Control
8 NA(z) NA(z) NA(z) NA(z) Treatment
9 NA(z) NA(z) NA(z) NA(z) Control
10 0 [0.No] 0 [0.No] 1 [1.Yes] 0 [0.No] Control
变量的格式是这样的:
str(short)
tibble [200 x 5] (S3: tbl_df/tbl/data.frame)
$ q4_1 : dbl+lbl [1:200] NA(z), NA(z), 1, 0, 0, NA(z), 1, NA(z), NA(z), 0, NA(z), 1, NA(z), 1, NA(z), 1, ...
..@ label : chr "q4_1r.Do you have any of ...assignments? Bilingual/ELL"
..@ format.stata: chr "%15.0g"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "0.No" "1.Yes"
$ q4_2 : dbl+lbl [1:200] NA(z), NA(z), 0, 0, 0, NA(z), 1, NA(z), NA(z), 0, NA(z), 0, NA(z), 0, NA(z), 0, ...
..@ label : chr "q4_2r.Do you have any of ...assignments? Sp Ed (self-c)"
..@ format.stata: chr "%34.0g"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "0.No" "1.Yes"
$ q4_3 : dbl+lbl [1:200] NA(z), NA(z), 0, 1, 0, NA(z), 1, NA(z), NA(z), 1, NA(z), 1, NA(z), 1, NA(z), 0, ...
..@ label : chr "q4_3r.Do you have any of ...assignments? Sp Ed (incl.)"
..@ format.stata: chr "%72.0g"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "0.No" "1.Yes"
$ q4_4 : dbl+lbl [1:200] NA(z), NA(z), 1, 0, 1, NA(z), 1, NA(z), NA(z), 0, NA(z), 1, NA(z), 0, NA(z), 0, ...
..@ label : chr "q4_4r.Do you have any of ...assignments? Gifted/Talented"
..@ format.stata: chr "%17.0g"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "0.No" "1.Yes"
$ treatment_cur: chr [1:200] "Control" "Control" "Treatment" "Control" ...
..- attr(*, "label")= chr "treatment_cur.treatment_cur"
..- attr(*, "format.stata")= chr "%9s"
这是每个变量的class:
> class(short$q4_1)
[1] "haven_labelled" "vctrs_vctr" "double"
我需要使用库 (gtsummary) 中的 tbl_summary 创建数据的描述性表格——这是一个非常酷的包,可以创建快速且可自定义的数据汇总统计数据。
我的数据很酷的一点是每个值都有一个与之关联的标签。例如在 q4_2 中,0 表示“否”,1 表示“是”。所以当我将数据输入 tbl_summary 时,而不是在频率计数中显示:
q4_1 n
1 7
0 8
这可以显示出来 这就是我想要的:
"q4_1r.Do you have any of ...assignments? Bilingual/ELL"
n
No 7
Yes 8
此代码无效,因为 tbl_summary 只接受某些格式。
tbl_summary(short)
Column(s) ‘q4_1’, ‘q4_2’, ‘q4_3’, and ‘q4_4’ omitted from output.
Accepted classes are ‘character’, ‘factor’, ‘numeric’, ‘logical’, ‘integer’, or ‘difftime’.
如果我将这些变量转换为字符,它们将失去其值标签,并且我只能看到以下内容,因为将其转换为字符会使变量失去其标签属性。
q4_1 n
1 7
0 8
有什么办法可以解决这个问题吗?我找不到具有这种类型的 var 格式的内置 R 文件来使其更易于重现。
在标记为 class 的避风港的情况下,它绝不是用于分析或数据探索的 class。相反,当从数据类型与 R 没有 one-to-one 关系的其他语言导入数据时,它被创建为 in-between。这是来自 tidyverse 博客 post 关于避风港标记为 class 个变量。 (https://haven.tidyverse.org/articles/semantics.html)
The goal of haven is not to provide a labelled vector that you can use everywhere in your analysis. The goal is to provide an intermediate data structure that you can convert into a regular R data frame.
要使用 tbl_summary()
,您首先要在导入的数据框上应用 as_factor()
函数,例如haven::as_factor(short)
。这会将您的数据框转换为基本 R 类型,并保留 Stata 值标签作为因子。
仅供参考,我们正在使 tbl_summary()
与所有类型兼容,并且在下一版本的软件包中将不再需要 as_factor()
步骤。您可以在此处关注实施进度:https://github.com/ddsjoberg/gtsummary/pull/603
我已将 Stata (dta) 文件读入 R,数据片段如下所示:
short
# A tibble: 200 x 5
q4_1 q4_2 q4_3 q4_4 treatment_cur
<dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <chr>
1 NA(z) NA(z) NA(z) NA(z) Control
2 NA(z) NA(z) NA(z) NA(z) Control
3 1 [1.Yes] 0 [0.No] 0 [0.No] 1 [1.Yes] Treatment
4 0 [0.No] 0 [0.No] 1 [1.Yes] 0 [0.No] Control
5 0 [0.No] 0 [0.No] 0 [0.No] 1 [1.Yes] Control
6 NA(z) NA(z) NA(z) NA(z) Control
7 1 [1.Yes] 1 [1.Yes] 1 [1.Yes] 1 [1.Yes] Control
8 NA(z) NA(z) NA(z) NA(z) Treatment
9 NA(z) NA(z) NA(z) NA(z) Control
10 0 [0.No] 0 [0.No] 1 [1.Yes] 0 [0.No] Control
变量的格式是这样的:
str(short)
tibble [200 x 5] (S3: tbl_df/tbl/data.frame)
$ q4_1 : dbl+lbl [1:200] NA(z), NA(z), 1, 0, 0, NA(z), 1, NA(z), NA(z), 0, NA(z), 1, NA(z), 1, NA(z), 1, ...
..@ label : chr "q4_1r.Do you have any of ...assignments? Bilingual/ELL"
..@ format.stata: chr "%15.0g"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "0.No" "1.Yes"
$ q4_2 : dbl+lbl [1:200] NA(z), NA(z), 0, 0, 0, NA(z), 1, NA(z), NA(z), 0, NA(z), 0, NA(z), 0, NA(z), 0, ...
..@ label : chr "q4_2r.Do you have any of ...assignments? Sp Ed (self-c)"
..@ format.stata: chr "%34.0g"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "0.No" "1.Yes"
$ q4_3 : dbl+lbl [1:200] NA(z), NA(z), 0, 1, 0, NA(z), 1, NA(z), NA(z), 1, NA(z), 1, NA(z), 1, NA(z), 0, ...
..@ label : chr "q4_3r.Do you have any of ...assignments? Sp Ed (incl.)"
..@ format.stata: chr "%72.0g"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "0.No" "1.Yes"
$ q4_4 : dbl+lbl [1:200] NA(z), NA(z), 1, 0, 1, NA(z), 1, NA(z), NA(z), 0, NA(z), 1, NA(z), 0, NA(z), 0, ...
..@ label : chr "q4_4r.Do you have any of ...assignments? Gifted/Talented"
..@ format.stata: chr "%17.0g"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "0.No" "1.Yes"
$ treatment_cur: chr [1:200] "Control" "Control" "Treatment" "Control" ...
..- attr(*, "label")= chr "treatment_cur.treatment_cur"
..- attr(*, "format.stata")= chr "%9s"
这是每个变量的class:
> class(short$q4_1)
[1] "haven_labelled" "vctrs_vctr" "double"
我需要使用库 (gtsummary) 中的 tbl_summary 创建数据的描述性表格——这是一个非常酷的包,可以创建快速且可自定义的数据汇总统计数据。
我的数据很酷的一点是每个值都有一个与之关联的标签。例如在 q4_2 中,0 表示“否”,1 表示“是”。所以当我将数据输入 tbl_summary 时,而不是在频率计数中显示:
q4_1 n
1 7
0 8
这可以显示出来 这就是我想要的:
"q4_1r.Do you have any of ...assignments? Bilingual/ELL"
n
No 7
Yes 8
此代码无效,因为 tbl_summary 只接受某些格式。
tbl_summary(short)
Column(s) ‘q4_1’, ‘q4_2’, ‘q4_3’, and ‘q4_4’ omitted from output.
Accepted classes are ‘character’, ‘factor’, ‘numeric’, ‘logical’, ‘integer’, or ‘difftime’.
如果我将这些变量转换为字符,它们将失去其值标签,并且我只能看到以下内容,因为将其转换为字符会使变量失去其标签属性。
q4_1 n
1 7
0 8
有什么办法可以解决这个问题吗?我找不到具有这种类型的 var 格式的内置 R 文件来使其更易于重现。
在标记为 class 的避风港的情况下,它绝不是用于分析或数据探索的 class。相反,当从数据类型与 R 没有 one-to-one 关系的其他语言导入数据时,它被创建为 in-between。这是来自 tidyverse 博客 post 关于避风港标记为 class 个变量。 (https://haven.tidyverse.org/articles/semantics.html)
The goal of haven is not to provide a labelled vector that you can use everywhere in your analysis. The goal is to provide an intermediate data structure that you can convert into a regular R data frame.
要使用 tbl_summary()
,您首先要在导入的数据框上应用 as_factor()
函数,例如haven::as_factor(short)
。这会将您的数据框转换为基本 R 类型,并保留 Stata 值标签作为因子。
仅供参考,我们正在使 tbl_summary()
与所有类型兼容,并且在下一版本的软件包中将不再需要 as_factor()
步骤。您可以在此处关注实施进度:https://github.com/ddsjoberg/gtsummary/pull/603