如何使用 map_dbl 创建两个变量和过滤器
How create two Variable and filter with map_dbl
我有以下数据
Year <- c("2021","2021","2021","2021","2021","2021")
Month <- c("8","8","8","8","8","8")
Day <- c("10","15","18","20","22","25")
Hour <- c("171110","171138","174247","183542","190156","190236")
Id_Type <- c("2","2","1","","1","")
Code_Intersecction <- c("340","","","210","750","980")
Data = data.frame(Year,Month,Day,Hour,Id_Type,Code_Intersecction)
我需要计算基数中存在的“”的数量,因为如果它大于 5%,我将使用以下值,否则取值为 1,否则为 0
Data_Null = as.data.frame(purrr::map_dbl(Data, .f = function(x){ifelse(round(sum(x == '')/nrow(Data)*100L,3) >= 5, 1, 0)}))
colnames(Data_Null) = "Null"
当我看到数据框时,问题就来了,它只需要一列而不是两列;名称和值 0/1
怎样才能让它显示如下
使用tibble:rownames_to_column
:
tibble::rownames_to_column(Data_Null, var ="Variables")
# A tibble: 6 x 2
Variables Null
<chr> <dbl>
1 Year 0
2 Month 0
3 Day 0
4 Hour 0
5 Id_Type 1
6 Code_Intersecction 1
我们可以在 base R
中的逻辑矩阵上使用 colMeans
,将命名向量转换为两列 data.frame 和 stack
stack(+(colMeans(Data == "") > 0.05))[2:1]
解释 - Data == ""
returns一个逻辑矩阵,colMeans
得到每一列的逻辑向量的mean
(这将是 TRUE 值的百分比 (*100
)),然后通过与 0.05
(5%)比较转换为逻辑向量。可以使用 (+
) 或使用 as.integer
将逻辑转换为二进制。 colMeans
的输出是命名的 vector
,它保持原样。 stack
将逻辑命名向量转换为两列 data.frame。索引 ([2:1]
) 将对列重新排序,即第 2 列首先出现,然后是第一列。
-输出
ind values
1 Year 0
2 Month 0
3 Day 0
4 Hour 0
5 Id_Type 1
6 Code_Intersecction 1
和tidyverse
,等价于enframe
(来自tibble
)
library(dplyr)
library(tidyr)
library(purrr)
map(Data, ~ +(round(mean(.x == ""), 3) * 100 >= 5)) %>%
enframe(name = 'Variables') %>%
unnest(value)
# A tibble: 6 × 2
Variables value
<chr> <int>
1 Year 0
2 Month 0
3 Day 0
4 Hour 0
5 Id_Type 1
6 Code_Intersecction 1
基数 R:
Data$Variables <- rownames(Data)
我有以下数据
Year <- c("2021","2021","2021","2021","2021","2021")
Month <- c("8","8","8","8","8","8")
Day <- c("10","15","18","20","22","25")
Hour <- c("171110","171138","174247","183542","190156","190236")
Id_Type <- c("2","2","1","","1","")
Code_Intersecction <- c("340","","","210","750","980")
Data = data.frame(Year,Month,Day,Hour,Id_Type,Code_Intersecction)
我需要计算基数中存在的“”的数量,因为如果它大于 5%,我将使用以下值,否则取值为 1,否则为 0
Data_Null = as.data.frame(purrr::map_dbl(Data, .f = function(x){ifelse(round(sum(x == '')/nrow(Data)*100L,3) >= 5, 1, 0)}))
colnames(Data_Null) = "Null"
当我看到数据框时,问题就来了,它只需要一列而不是两列;名称和值 0/1
怎样才能让它显示如下
使用tibble:rownames_to_column
:
tibble::rownames_to_column(Data_Null, var ="Variables")
# A tibble: 6 x 2
Variables Null
<chr> <dbl>
1 Year 0
2 Month 0
3 Day 0
4 Hour 0
5 Id_Type 1
6 Code_Intersecction 1
我们可以在 base R
中的逻辑矩阵上使用 colMeans
,将命名向量转换为两列 data.frame 和 stack
stack(+(colMeans(Data == "") > 0.05))[2:1]
解释 - Data == ""
returns一个逻辑矩阵,colMeans
得到每一列的逻辑向量的mean
(这将是 TRUE 值的百分比 (*100
)),然后通过与 0.05
(5%)比较转换为逻辑向量。可以使用 (+
) 或使用 as.integer
将逻辑转换为二进制。 colMeans
的输出是命名的 vector
,它保持原样。 stack
将逻辑命名向量转换为两列 data.frame。索引 ([2:1]
) 将对列重新排序,即第 2 列首先出现,然后是第一列。
-输出
ind values
1 Year 0
2 Month 0
3 Day 0
4 Hour 0
5 Id_Type 1
6 Code_Intersecction 1
和tidyverse
,等价于enframe
(来自tibble
)
library(dplyr)
library(tidyr)
library(purrr)
map(Data, ~ +(round(mean(.x == ""), 3) * 100 >= 5)) %>%
enframe(name = 'Variables') %>%
unnest(value)
# A tibble: 6 × 2
Variables value
<chr> <int>
1 Year 0
2 Month 0
3 Day 0
4 Hour 0
5 Id_Type 1
6 Code_Intersecction 1
基数 R:
Data$Variables <- rownames(Data)