计算 skimr::skim_with 中的百分比
Calculate percentages in skimr::skim_with
我正在尝试将因子水平的百分比添加到 skimr::skim
输出。我尝试使用 table
函数,但它没有按预期工作。我可以得到正确格式的不同物种的百分比,类似于 top_count
?
library(skimr)
skim(iris)
Name
iris
Number of rows
150
Number of columns
5
_______________________
Column type frequency:
factor
1
numeric
4
________________________
Group variables
None
数据汇总
变量类型:因子
skim_variable
n_missing
complete_rate
ordered
n_unique
top_counts
Species
0
1
FALSE
3
set: 50, ver: 50, vir: 50
变量类型:数值
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
Sepal.Length
0
1
5.84
0.83
4.3
5.1
5.80
6.4
7.9
▆▇▇▅▂
Sepal.Width
0
1
3.06
0.44
2.0
2.8
3.00
3.3
4.4
▁▆▇▂▁
Petal.Length
0
1
3.76
1.77
1.0
1.6
4.35
5.1
6.9
▇▁▆▇▂
Petal.Width
0
1
1.20
0.76
0.1
0.3
1.30
1.8
2.5
▇▁▇▅▃
my_skim <- skim_with(factor=sfl(pct = ~prop.table(table(.))))
my_skim(iris)
Name
iris
Number of rows
150
Number of columns
5
_______________________
Column type frequency:
factor
1
numeric
4
________________________
Group variables
None
数据汇总
变量类型:因子
skim_variable
n_missing
complete_rate
ordered
n_unique
top_counts
pct
Species
0
1
FALSE
3
set: 50, ver: 50, vir: 50
0.3333333
Species
0
1
FALSE
3
set: 50, ver: 50, vir: 50
0.3333333
Species
0
1
FALSE
3
set: 50, ver: 50, vir: 50
0.3333333
变量类型:数值
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
Sepal.Length
0
1
5.84
0.83
4.3
5.1
5.80
6.4
7.9
▆▇▇▅▂
Sepal.Width
0
1
3.06
0.44
2.0
2.8
3.00
3.3
4.4
▁▆▇▂▁
Petal.Length
0
1
3.76
1.77
1.0
1.6
4.35
5.1
6.9
▇▁▆▇▂
Petal.Width
0
1
1.20
0.76
0.1
0.3
1.30
1.8
2.5
▇▁▇▅▃
由 reprex package (v2.0.1)
于 2022-02-27 创建
我们可以paste
(str_c
) 创建一个字符串
library(skimr)
my_skim <- skim_with(factor=sfl(pct = ~{
prt <- prop.table(table(.))
val <- sprintf("%.2f", prt)
nm1 <- tolower(substr(names(prt), 1, 3))
stringr::str_c(nm1, val, sep = ": ", collapse = ", ")
})
)
-测试
> my_skim(iris)
── Data Summary ────────────────────────
Values
Name iris
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None
── Variable type: factor ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate ordered n_unique top_counts pct
1 Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50 set: 0.33, ver: 0.33, vir: 0.33
── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁
3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃
我正在尝试将因子水平的百分比添加到 skimr::skim
输出。我尝试使用 table
函数,但它没有按预期工作。我可以得到正确格式的不同物种的百分比,类似于 top_count
?
library(skimr)
skim(iris)
Name | iris |
Number of rows | 150 |
Number of columns | 5 |
_______________________ | |
Column type frequency: | |
factor | 1 |
numeric | 4 |
________________________ | |
Group variables | None |
数据汇总
变量类型:因子
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
Species | 0 | 1 | FALSE | 3 | set: 50, ver: 50, vir: 50 |
变量类型:数值
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Sepal.Length | 0 | 1 | 5.84 | 0.83 | 4.3 | 5.1 | 5.80 | 6.4 | 7.9 | ▆▇▇▅▂ |
Sepal.Width | 0 | 1 | 3.06 | 0.44 | 2.0 | 2.8 | 3.00 | 3.3 | 4.4 | ▁▆▇▂▁ |
Petal.Length | 0 | 1 | 3.76 | 1.77 | 1.0 | 1.6 | 4.35 | 5.1 | 6.9 | ▇▁▆▇▂ |
Petal.Width | 0 | 1 | 1.20 | 0.76 | 0.1 | 0.3 | 1.30 | 1.8 | 2.5 | ▇▁▇▅▃ |
my_skim <- skim_with(factor=sfl(pct = ~prop.table(table(.))))
my_skim(iris)
Name | iris |
Number of rows | 150 |
Number of columns | 5 |
_______________________ | |
Column type frequency: | |
factor | 1 |
numeric | 4 |
________________________ | |
Group variables | None |
数据汇总
变量类型:因子
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts | pct |
---|---|---|---|---|---|---|
Species | 0 | 1 | FALSE | 3 | set: 50, ver: 50, vir: 50 | 0.3333333 |
Species | 0 | 1 | FALSE | 3 | set: 50, ver: 50, vir: 50 | 0.3333333 |
Species | 0 | 1 | FALSE | 3 | set: 50, ver: 50, vir: 50 | 0.3333333 |
变量类型:数值
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Sepal.Length | 0 | 1 | 5.84 | 0.83 | 4.3 | 5.1 | 5.80 | 6.4 | 7.9 | ▆▇▇▅▂ |
Sepal.Width | 0 | 1 | 3.06 | 0.44 | 2.0 | 2.8 | 3.00 | 3.3 | 4.4 | ▁▆▇▂▁ |
Petal.Length | 0 | 1 | 3.76 | 1.77 | 1.0 | 1.6 | 4.35 | 5.1 | 6.9 | ▇▁▆▇▂ |
Petal.Width | 0 | 1 | 1.20 | 0.76 | 0.1 | 0.3 | 1.30 | 1.8 | 2.5 | ▇▁▇▅▃ |
由 reprex package (v2.0.1)
于 2022-02-27 创建我们可以paste
(str_c
) 创建一个字符串
library(skimr)
my_skim <- skim_with(factor=sfl(pct = ~{
prt <- prop.table(table(.))
val <- sprintf("%.2f", prt)
nm1 <- tolower(substr(names(prt), 1, 3))
stringr::str_c(nm1, val, sep = ": ", collapse = ", ")
})
)
-测试
> my_skim(iris)
── Data Summary ────────────────────────
Values
Name iris
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None
── Variable type: factor ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate ordered n_unique top_counts pct
1 Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50 set: 0.33, ver: 0.33, vir: 0.33
── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁
3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃