使用 Ns 和比例对 tabyl 输出进行排序
Sorting tabyl output with Ns and proportions
我的第一个 SO 问题!
我正在尝试从看门程序包中订购对 tabyl
的调用结果。我不知道如何对 adorn_ns()
中附加的数字进行排序。
使用 tabyl,我设法使用以下代码创建了一个 table,其中包含频率、比例和总数。我想要实现的是按“总计”列的降序对 table 进行排序。最后,我想将 table 传递给 knitr 的 kable() 进行报告。
在我对表格调用 arrange
之后,adorn_ns() 将 N 粘贴到错误的“原始”位置,而不是已排序的位置。这已在 Github 中指出,并且(据我了解)是由于 'core'
在对 tabyl 进行排序时未更改引起的。
参见:https://github.com/sfirke/janitor/issues/352
Github 上的评论指出:
“这不是一个关键问题,您可以将自定义 Ns 提供给 adorn_ns() 调用,您也可以在那里进行排序。”
不幸的是,我不知道如何放置这些自定义 N。
或者,我考虑过使用因子更改顺序,但是我希望有一个更强大的解决方案,因为这个变量在我的真实数据中包含许多类别,我希望能够应用这个(或table 的另一种方法)-在未来渲染到不同的变量,而不必费力地按频率输入级别。
所以,非常感谢任何有关自定义 Ns、替代排序方法或(如果证明是必要的)替代 table 方法的帮助。
这是一些玩具数据和我卡住的地方。
library(dplyr)
library(janitor)
# some toy data
var1 <- c("aaa", "bbb", "ccc", "ccc", "ddd", "ddd", "ddd", "ddd", "aaa", "ddd", "ddd", "bbb", "bbb", "ddd")
sex <- c("f", "f", "m", "f", "m", "m", "f", "f", "m", "m", "f", "m", "f", "f")
df <- data.frame(var1,sex)
# First a tabyl with proportions, Ns and totals
tabyl(df, var1, sex) %>%
adorn_totals(where = c("col", "row")) %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")
# Results in (as expected)
|var1 |f |m |Total |
|:-----|:--------|:--------|:---------|
|aaa |1 (12%) |1 (17%) |2 (14%) |
|bbb |2 (25%) |1 (17%) |3 (21%) |
|ccc |1 (12%) |1 (17%) |2 (14%) |
|ddd |4 (50%) |3 (50%) |7 (50%) |
|Total |8 (100%) |6 (100%) |14 (100%) |
我想达到的目标:
# descending order of frequency
|var1 |f |m |Total |
|:-----|:--------|:--------|:---------|
|ddd |4 (50%) |3 (50%) |7 (50%) |
|bbb |2 (25%) |1 (17%) |3 (21%) |
|aaa |1 (12%) |1 (17%) |2 (14%) |
|ccc |1 (12%) |1 (17%) |2 (14%) |
|Total |8 (100%) |6 (100%) |14 (100%) |
我尝试了什么:
# Order by the Total column in descending frequency
df %>% tabyl(var1,sex) %>%
adorn_totals(where = "col") %>% # split col and row totals
arrange(desc(Total)) %>%
adorn_totals(where = "row") %>% # prevents total-row appearing at top)
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")
# Results in (not what I expected)
|var1 |f |m |Total |
|:-----|:--------|:--------|:---------|
|ddd |1 (50%) |1 (50%) |2 (50%) |
|bbb |2 (25%) |1 (17%) |3 (21%) |
|aaa |1 (12%) |1 (17%) |2 (14%) |
|ccc |4 (12%) |3 (17%) |7 (14%) |
|Total |8 (100%) |6 (100%) |14 (100%) |
# The categories have changed order, the N's have not (are in original position in table),
# and the % have been recalculated...
OP 要求更新:见评论:
这不是那么优雅,但它会带你到你想要的输出:
df1 <- df %>% tabyl(var1,sex) %>%
adorn_totals(where = "col") %>% # split col and row totals
adorn_totals(where = "row") %>% # prevents total-row appearing at top)
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front") %>%
arrange(desc(Total))
df2 <- df1[1,]
df3 <- df1[-1,]
bind_rows(df3, df2)
输出:
var1 f m Total
ddd 4 (50%) 3 (50%) 7 (50%)
bbb 2 (25%) 1 (17%) 3 (21%)
aaa 1 (12%) 1 (17%) 2 (14%)
ccc 1 (12%) 1 (17%) 2 (14%)
Total 8 (100%) 6 (100%) 14 (100%)
第一个回答:
使用 sort = TRUE
df %>% tabyl(var1,sex, sort = TRUE) %>%
adorn_totals(where = "col") %>% # split col and row totals
#arrange(desc(Total)) %>%
adorn_totals(where = "row") %>% # prevents total-row appearing at top)
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")
输出:
var1 f m Total
aaa 1 (12%) 1 (17%) 2 (14%)
bbb 2 (25%) 1 (17%) 3 (21%)
ccc 1 (12%) 1 (17%) 2 (14%)
ddd 4 (50%) 3 (50%) 7 (50%)
Total 8 (100%) 6 (100%) 14 (100%)
下面是提供经过排序以匹配 tabyl 排序的自定义 N 的样子。我将排序后的 tabyl
保存为对象以避免重复代码。
main <- tabyl(df, var1, sex) %>%
adorn_totals(where = "col") %>%
arrange(desc(Total)) %>%
adorn_totals(where = "row")
main %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front", ns = main)
var1 f m Total
ddd 4 (50%) 3 (50%) 7 (50%)
bbb 2 (25%) 1 (17%) 3 (21%)
aaa 1 (12%) 1 (17%) 2 (14%)
ccc 1 (12%) 1 (17%) 2 (14%)
Total 8 (100%) 6 (100%) 14 (100%)
我在那个 GitHub 问题上添加了一个 link 以指向这里,所以有一个例子。
如果您更喜欢不保存任何对象的较长代码块,这里有相同的不同方式:
tabyl(df, var1, sex) %>%
adorn_totals(where = "col") %>%
arrange(desc(Total)) %>%
adorn_totals(where = "row") %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front",
ns = tabyl(df, var1, sex) %>%
adorn_totals(where = "col") %>%
arrange(desc(Total)) %>%
adorn_totals(where = "row"))
我的第一个 SO 问题!
我正在尝试从看门程序包中订购对 tabyl
的调用结果。我不知道如何对 adorn_ns()
中附加的数字进行排序。
使用 tabyl,我设法使用以下代码创建了一个 table,其中包含频率、比例和总数。我想要实现的是按“总计”列的降序对 table 进行排序。最后,我想将 table 传递给 knitr 的 kable() 进行报告。
在我对表格调用 arrange
之后,adorn_ns() 将 N 粘贴到错误的“原始”位置,而不是已排序的位置。这已在 Github 中指出,并且(据我了解)是由于 'core'
在对 tabyl 进行排序时未更改引起的。
参见:https://github.com/sfirke/janitor/issues/352
Github 上的评论指出: “这不是一个关键问题,您可以将自定义 Ns 提供给 adorn_ns() 调用,您也可以在那里进行排序。” 不幸的是,我不知道如何放置这些自定义 N。
或者,我考虑过使用因子更改顺序,但是我希望有一个更强大的解决方案,因为这个变量在我的真实数据中包含许多类别,我希望能够应用这个(或table 的另一种方法)-在未来渲染到不同的变量,而不必费力地按频率输入级别。
所以,非常感谢任何有关自定义 Ns、替代排序方法或(如果证明是必要的)替代 table 方法的帮助。
这是一些玩具数据和我卡住的地方。
library(dplyr)
library(janitor)
# some toy data
var1 <- c("aaa", "bbb", "ccc", "ccc", "ddd", "ddd", "ddd", "ddd", "aaa", "ddd", "ddd", "bbb", "bbb", "ddd")
sex <- c("f", "f", "m", "f", "m", "m", "f", "f", "m", "m", "f", "m", "f", "f")
df <- data.frame(var1,sex)
# First a tabyl with proportions, Ns and totals
tabyl(df, var1, sex) %>%
adorn_totals(where = c("col", "row")) %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")
# Results in (as expected)
|var1 |f |m |Total |
|:-----|:--------|:--------|:---------|
|aaa |1 (12%) |1 (17%) |2 (14%) |
|bbb |2 (25%) |1 (17%) |3 (21%) |
|ccc |1 (12%) |1 (17%) |2 (14%) |
|ddd |4 (50%) |3 (50%) |7 (50%) |
|Total |8 (100%) |6 (100%) |14 (100%) |
我想达到的目标:
# descending order of frequency
|var1 |f |m |Total |
|:-----|:--------|:--------|:---------|
|ddd |4 (50%) |3 (50%) |7 (50%) |
|bbb |2 (25%) |1 (17%) |3 (21%) |
|aaa |1 (12%) |1 (17%) |2 (14%) |
|ccc |1 (12%) |1 (17%) |2 (14%) |
|Total |8 (100%) |6 (100%) |14 (100%) |
我尝试了什么:
# Order by the Total column in descending frequency
df %>% tabyl(var1,sex) %>%
adorn_totals(where = "col") %>% # split col and row totals
arrange(desc(Total)) %>%
adorn_totals(where = "row") %>% # prevents total-row appearing at top)
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")
# Results in (not what I expected)
|var1 |f |m |Total |
|:-----|:--------|:--------|:---------|
|ddd |1 (50%) |1 (50%) |2 (50%) |
|bbb |2 (25%) |1 (17%) |3 (21%) |
|aaa |1 (12%) |1 (17%) |2 (14%) |
|ccc |4 (12%) |3 (17%) |7 (14%) |
|Total |8 (100%) |6 (100%) |14 (100%) |
# The categories have changed order, the N's have not (are in original position in table),
# and the % have been recalculated...
OP 要求更新:见评论:
这不是那么优雅,但它会带你到你想要的输出:
df1 <- df %>% tabyl(var1,sex) %>%
adorn_totals(where = "col") %>% # split col and row totals
adorn_totals(where = "row") %>% # prevents total-row appearing at top)
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front") %>%
arrange(desc(Total))
df2 <- df1[1,]
df3 <- df1[-1,]
bind_rows(df3, df2)
输出:
var1 f m Total
ddd 4 (50%) 3 (50%) 7 (50%)
bbb 2 (25%) 1 (17%) 3 (21%)
aaa 1 (12%) 1 (17%) 2 (14%)
ccc 1 (12%) 1 (17%) 2 (14%)
Total 8 (100%) 6 (100%) 14 (100%)
第一个回答:
使用 sort = TRUE
df %>% tabyl(var1,sex, sort = TRUE) %>%
adorn_totals(where = "col") %>% # split col and row totals
#arrange(desc(Total)) %>%
adorn_totals(where = "row") %>% # prevents total-row appearing at top)
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")
输出:
var1 f m Total
aaa 1 (12%) 1 (17%) 2 (14%)
bbb 2 (25%) 1 (17%) 3 (21%)
ccc 1 (12%) 1 (17%) 2 (14%)
ddd 4 (50%) 3 (50%) 7 (50%)
Total 8 (100%) 6 (100%) 14 (100%)
下面是提供经过排序以匹配 tabyl 排序的自定义 N 的样子。我将排序后的 tabyl
保存为对象以避免重复代码。
main <- tabyl(df, var1, sex) %>%
adorn_totals(where = "col") %>%
arrange(desc(Total)) %>%
adorn_totals(where = "row")
main %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front", ns = main)
var1 f m Total
ddd 4 (50%) 3 (50%) 7 (50%)
bbb 2 (25%) 1 (17%) 3 (21%)
aaa 1 (12%) 1 (17%) 2 (14%)
ccc 1 (12%) 1 (17%) 2 (14%)
Total 8 (100%) 6 (100%) 14 (100%)
我在那个 GitHub 问题上添加了一个 link 以指向这里,所以有一个例子。
如果您更喜欢不保存任何对象的较长代码块,这里有相同的不同方式:
tabyl(df, var1, sex) %>%
adorn_totals(where = "col") %>%
arrange(desc(Total)) %>%
adorn_totals(where = "row") %>%
adorn_percentages("col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front",
ns = tabyl(df, var1, sex) %>%
adorn_totals(where = "col") %>%
arrange(desc(Total)) %>%
adorn_totals(where = "row"))