使用 Ns 和比例对 tabyl 输出进行排序

Sorting tabyl output with Ns and proportions

我的第一个 SO 问题!

我正在尝试从看门程序包中订购对 tabyl 的调用结果。我不知道如何对 adorn_ns() 中附加的数字进行排序。

使用 tabyl,我设法使用以下代码创建了一个 table,其中包含频率、比例和总数。我想要实现的是按“总计”列的降序对 table 进行排序。最后,我想将 table 传递给 knitr 的 kable() 进行报告。

在我对表格调用 arrange 之后,adorn_ns() 将 N 粘贴到错误的“原始”位置,而不是已排序的位置。这已在 Github 中指出,并且(据我了解)是由于 'core' 在对 tabyl 进行排序时未更改引起的。 参见:https://github.com/sfirke/janitor/issues/352

Github 上的评论指出: “这不是一个关键问题,您可以将自定义 Ns 提供给 adorn_ns() 调用,您也可以在那里进行排序。” 不幸的是,我不知道如何放置这些自定义 N。

或者,我考虑过使用因子更改顺序,但是我希望有一个更强大的解决方案,因为这个变量在我的真实数据中包含许多类别,我希望能够应用这个(或table 的另一种方法)-在未来渲染到不同的变量,而不必费力地按频率输入级别。

所以,非常感谢任何有关自定义 Ns、替代排序方法或(如果证明是必要的)替代 table 方法的帮助。

这是一些玩具数据和我卡住的地方。

library(dplyr)
library(janitor)

# some toy data
var1 <- c("aaa", "bbb", "ccc", "ccc", "ddd", "ddd", "ddd", "ddd", "aaa", "ddd", "ddd", "bbb", "bbb", "ddd")
sex <- c("f", "f", "m", "f", "m", "m", "f", "f", "m", "m", "f", "m", "f", "f")
df <- data.frame(var1,sex)


# First a tabyl with proportions, Ns and totals
tabyl(df, var1, sex) %>%
  adorn_totals(where = c("col", "row")) %>%
  adorn_percentages("col") %>%
  adorn_pct_formatting(digits = 0) %>%
  adorn_ns(position = "front")

# Results in (as expected)

|var1  |f        |m        |Total     |
|:-----|:--------|:--------|:---------|
|aaa   |1  (12%) |1  (17%) |2  (14%)  |
|bbb   |2  (25%) |1  (17%) |3  (21%)  |
|ccc   |1  (12%) |1  (17%) |2  (14%)  |
|ddd   |4  (50%) |3  (50%) |7  (50%)  |
|Total |8 (100%) |6 (100%) |14 (100%) |

我想达到的目标:

# descending order of frequency
|var1  |f        |m        |Total     |
|:-----|:--------|:--------|:---------|
|ddd   |4  (50%) |3  (50%) |7  (50%)  |
|bbb   |2  (25%) |1  (17%) |3  (21%)  |
|aaa   |1  (12%) |1  (17%) |2  (14%)  |
|ccc   |1  (12%) |1  (17%) |2  (14%)  |
|Total |8 (100%) |6 (100%) |14 (100%) |

我尝试了什么:

# Order by the Total column in descending frequency

df %>% tabyl(var1,sex) %>%
  adorn_totals(where = "col") %>%   # split col and row totals 
  arrange(desc(Total)) %>%
  adorn_totals(where = "row") %>%   # prevents total-row appearing at top)
  adorn_percentages("col") %>%
  adorn_pct_formatting(digits = 0) %>%
  adorn_ns(position = "front") 

# Results in (not what I expected)
|var1  |f        |m        |Total     |
|:-----|:--------|:--------|:---------|
|ddd   |1  (50%) |1  (50%) |2  (50%)  |
|bbb   |2  (25%) |1  (17%) |3  (21%)  |
|aaa   |1  (12%) |1  (17%) |2  (14%)  |
|ccc   |4  (12%) |3  (17%) |7  (14%)  |
|Total |8 (100%) |6 (100%) |14 (100%) |

# The categories have changed order, the N's have not (are in original position in table),
# and the % have been recalculated...

OP 要求更新:见评论:

这不是那么优雅,但它会带你到你想要的输出:

df1 <- df %>% tabyl(var1,sex) %>%
    adorn_totals(where = "col") %>%   # split col and row totals 
    adorn_totals(where = "row") %>%   # prevents total-row appearing at top)
    adorn_percentages("col") %>%
    adorn_pct_formatting(digits = 0) %>%
    adorn_ns(position = "front") %>% 
    arrange(desc(Total))

df2 <- df1[1,]
df3 <- df1[-1,]   

bind_rows(df3, df2)

输出:

  var1        f        m     Total
   ddd 4  (50%) 3  (50%)  7  (50%)
   bbb 2  (25%) 1  (17%)  3  (21%)
   aaa 1  (12%) 1  (17%)  2  (14%)
   ccc 1  (12%) 1  (17%)  2  (14%)
 Total 8 (100%) 6 (100%) 14 (100%)

第一个回答: 使用 sort = TRUE

df %>% tabyl(var1,sex, sort = TRUE) %>%
  adorn_totals(where = "col") %>%   # split col and row totals 
  #arrange(desc(Total)) %>%
  adorn_totals(where = "row") %>%   # prevents total-row appearing at top)
  adorn_percentages("col") %>%
  adorn_pct_formatting(digits = 0) %>%
  adorn_ns(position = "front") 

输出:

  var1        f        m     Total
   aaa 1  (12%) 1  (17%)  2  (14%)
   bbb 2  (25%) 1  (17%)  3  (21%)
   ccc 1  (12%) 1  (17%)  2  (14%)
   ddd 4  (50%) 3  (50%)  7  (50%)
 Total 8 (100%) 6 (100%) 14 (100%)

下面是提供经过排序以匹配 tabyl 排序的自定义 N 的样子。我将排序后的 tabyl 保存为对象以避免重复代码。

main <- tabyl(df, var1, sex) %>%
  adorn_totals(where = "col") %>%
  arrange(desc(Total)) %>%
  adorn_totals(where = "row")

main %>%
  adorn_percentages("col") %>%
  adorn_pct_formatting(digits = 0) %>%
  adorn_ns(position = "front", ns = main)

  var1        f        m     Total
   ddd 4  (50%) 3  (50%)  7  (50%)
   bbb 2  (25%) 1  (17%)  3  (21%)
   aaa 1  (12%) 1  (17%)  2  (14%)
   ccc 1  (12%) 1  (17%)  2  (14%)
 Total 8 (100%) 6 (100%) 14 (100%)

我在那个 GitHub 问题上添加了一个 link 以指向这里,所以有一个例子。

如果您更喜欢不保存任何对象的较长代码块,这里有相同的不同方式:

tabyl(df, var1, sex) %>%
  adorn_totals(where = "col") %>%
  arrange(desc(Total)) %>%
  adorn_totals(where = "row") %>%
  adorn_percentages("col") %>%
  adorn_pct_formatting(digits = 0) %>%
  adorn_ns(position = "front",
           ns = tabyl(df, var1, sex) %>%
             adorn_totals(where = "col") %>%
             arrange(desc(Total)) %>%
             adorn_totals(where = "row"))