使用 table 在 R 中可视化临时结果并使用变量名称进行汇总

Question

我正在分析我从一个大数据库创建的几个变量。它们大多是虚拟的或分类的，它们通常成对出现，并且它们是更大数据框的一部分。

对于变量，我想打印关于它的干净计算：

两个 tables：每个都有每个值的频率（包括 NA，即使它是 0）；
两者均值的总结

像这样：

Var01:
    0     1  <NA> 
50395 40292     0 

Var02:
    0     1  <NA> 
13757 76930     0 

Means:
  Var01  Var02
1 68.39% 96.39%

我只需要看到这些结果一次，而不是保存它们。

变量名其实比较复杂（比如：dm_idade_0a17_pre），不想像以前那样复制粘贴太多次

我尝试创建临时变量以及函数 table() 和 summary()。我使用了一个自定义函数来查看平均值的百分比（称为 percent()）。问题只是 table 函数没有向我显示变量的名称。

所以，我的编码是这样的：

###########

# CUSTOM FUNCTION

percent <- function(x, digits = 3, format = "f", ...) {
  paste0(formatC(x * 100, format = format, digits = digits, ...), "%")
}

# ORIGINAL DATA FRAME

df <- data.frame(
  ch_name = letters[1:5],
  ch_key = c(1:5))

# 1st new variable = 
df$ab_cd <- sample(0:1,5,replace = TRUE)

# 2nd new variable = 
df$ab_cd_e <- sample(0:1,5,replace = TRUE)


# CREATING TEMPORARY VARIABLES

{
  x1 <- df$ab_cd
  x2 <- df$ab_cd_e
  
  y1 <- table(x1, useNA = 'always')
  y2 <- table(x2, useNA = 'always')
  
  z1 <- data.frame(
    "ab_cd" = percent(mean(x1)),
    "ab_cd_e" = percent(mean(x2)))

#  PRINTING THEM
  
  cat("4")
  print(y1)
  print(y2)
  z1
}
###########

我得到的结果是这样的：

x1
   0    1 <NA> 
   2    3    0 
x2
   0    1 <NA> 
   3    2    0 
    ab_cd ab_cd_e
1   60.00% 40.00%

如果x1和x2变量的名称是我使用的列的原始名称，我的问题就解决了（虽然丑陋，但总比没有好）。

感谢大家的关注！

(拜托：这可能看起来很懒，但请记住，我仍然需要这样做 80 多次。每次，变量的名称都不够干净：它们很相似，这使得CTRL+F还是双击速度太慢，希望大家谅解！）

Answer 1

你可以这样做：

f <- function(s1,s2) {
  cat(s1)
  print(table(df[[s1]],useNA='always',deparse.level=0))
  cat(s1)
  print(table(df[[s1]],useNA='always',deparse.level=0))
  setNames(
    data.frame(percent(mean(df[[s1]], na.rm=T)),percent(mean(df[[s2]], na.rm=T))),
    c(s1,s2)
  )
}

用法：

f("ab_cd", "ab_cd_e")

输出：

ab_cd
   0    1 <NA> 
   1    4    0 
ab_cd
   0    1 <NA> 
   1    4    0 
    ab_cd ab_cd_e
1 80.000% 40.000%

使用 table 在 R 中可视化临时结果并使用变量名称进行汇总

Visualizing temporary results in R using table and summary with the name of the variables

datatable

r

function

summary