提取不同组的密度估计
Extract density estimation for different groups
我有一个如下所示的数据框 (df):
> summary(df)
Occurence Group
Min. :0.001 Length:7990
1st Qu.:0.028 Class :character
Median :0.160 Mode :character
Mean :0.195
3rd Qu.:0.307
Max. :0.600
NA's :5473
> unique(df$Group)
[1] "fa20,0" "sa20,0" "fa05,0" "sa10,0" "flatsa,0" "flatfa,0" "fa10,0" "sa05,0" "flatsa,1" "fa10,1" "fa05,1" "sa20,1" "flatfa,1" "fa20,1" "sa10,1" "sa05,1"
我正在尝试使用 density() 函数对每个唯一组的发生率进行核密度估计。我一次可以做一组:
> flatsa <- density(c(as.numeric(ag04_pattern_long$Occurence[ag04_pattern_long$Group == "flatsa,0"])), na.rm=T)
> flatsa_df2 <- enframe(flatsa$x, value = "X") %>%
+ add_column(Y=flatsa$y) %>%
+ add_column(Group = "flatsa,0") %>%
+ select(-name)
为 flatsa_df2 生成此输出:
# A tibble: 512 x 3
X Y Group
<dbl> <dbl> <chr>
1 -0.168 0.00317 flatsa,0
2 -0.166 0.00351 flatsa,0
3 -0.164 0.00387 flatsa,0
4 -0.162 0.00427 flatsa,0
5 -0.161 0.00471 flatsa,0
6 -0.159 0.00519 flatsa,0
7 -0.157 0.00570 flatsa,0
8 -0.155 0.00628 flatsa,0
9 -0.153 0.00689 flatsa,0
10 -0.151 0.00755 flatsa,0
# ... with 502 more rows
如何一次对 df$Group 中的所有 16 个唯一元素执行此操作?理想情况下,它们都将进入一个数据帧。我试过:
dens_table <- setDT(ag04_pattern_long)[, .(dens=density(ag04_pattern_long$Occurence, na.rm=T)), by = Group]
for(i in length(unique(ag04_pattern_long$Group))){
dens_table <- density(c(as.numeric(ag04_pattern_long$Occurence[i], na.rm=T)))
}
但是其中 none 产生了正确的输出。循环给了我一个错误,说它需要“至少 2 点到 select 带宽”。我认为这表明它没有考虑每个 unique(df$Group) 的所有 df$Occurence 值。
求助!
这是一个 base
R 方法:
occur_list = split(df$Occurrence, df$Group)
est_list = lapply(df_list, function(x) {
data.frame(density(x, na.rm=T)[c("x", "y")])
})
results = do.call(rbind, est_list)
results$Group = rep(names(occur_list), each = sapply(est_list, nrow))
我们也可以使用 for
循环,调整您的尝试:
results = list()
for(i in unique(ag04_pattern_long$Group)){
results[[i]] <- data.frame(density(ag04_pattern_long$Occurence[ag0f_pattern_long$Group == i], na.rm = T)[c("x", "y")])
results[[i]]$Group = i
}
results = do.call(rbind, results)
或使用dplyr
:
df %>%
nest_by(Group) %>%
mutate(dens = list(data.frame(density(data$Occurrence)[c("x", "y")]))) %>%
select(-data) %>%
unnest(cols = dens)
在所有情况下,我都从循环内部删除了 c(as.numeric())
。在循环之前确保整个 Occurrence
列都是数字 - 这比在循环内转换每一列要好。
我有一个如下所示的数据框 (df):
> summary(df)
Occurence Group
Min. :0.001 Length:7990
1st Qu.:0.028 Class :character
Median :0.160 Mode :character
Mean :0.195
3rd Qu.:0.307
Max. :0.600
NA's :5473
> unique(df$Group)
[1] "fa20,0" "sa20,0" "fa05,0" "sa10,0" "flatsa,0" "flatfa,0" "fa10,0" "sa05,0" "flatsa,1" "fa10,1" "fa05,1" "sa20,1" "flatfa,1" "fa20,1" "sa10,1" "sa05,1"
我正在尝试使用 density() 函数对每个唯一组的发生率进行核密度估计。我一次可以做一组:
> flatsa <- density(c(as.numeric(ag04_pattern_long$Occurence[ag04_pattern_long$Group == "flatsa,0"])), na.rm=T)
> flatsa_df2 <- enframe(flatsa$x, value = "X") %>%
+ add_column(Y=flatsa$y) %>%
+ add_column(Group = "flatsa,0") %>%
+ select(-name)
为 flatsa_df2 生成此输出:
# A tibble: 512 x 3
X Y Group
<dbl> <dbl> <chr>
1 -0.168 0.00317 flatsa,0
2 -0.166 0.00351 flatsa,0
3 -0.164 0.00387 flatsa,0
4 -0.162 0.00427 flatsa,0
5 -0.161 0.00471 flatsa,0
6 -0.159 0.00519 flatsa,0
7 -0.157 0.00570 flatsa,0
8 -0.155 0.00628 flatsa,0
9 -0.153 0.00689 flatsa,0
10 -0.151 0.00755 flatsa,0
# ... with 502 more rows
如何一次对 df$Group 中的所有 16 个唯一元素执行此操作?理想情况下,它们都将进入一个数据帧。我试过:
dens_table <- setDT(ag04_pattern_long)[, .(dens=density(ag04_pattern_long$Occurence, na.rm=T)), by = Group]
for(i in length(unique(ag04_pattern_long$Group))){
dens_table <- density(c(as.numeric(ag04_pattern_long$Occurence[i], na.rm=T)))
}
但是其中 none 产生了正确的输出。循环给了我一个错误,说它需要“至少 2 点到 select 带宽”。我认为这表明它没有考虑每个 unique(df$Group) 的所有 df$Occurence 值。
求助!
这是一个 base
R 方法:
occur_list = split(df$Occurrence, df$Group)
est_list = lapply(df_list, function(x) {
data.frame(density(x, na.rm=T)[c("x", "y")])
})
results = do.call(rbind, est_list)
results$Group = rep(names(occur_list), each = sapply(est_list, nrow))
我们也可以使用 for
循环,调整您的尝试:
results = list()
for(i in unique(ag04_pattern_long$Group)){
results[[i]] <- data.frame(density(ag04_pattern_long$Occurence[ag0f_pattern_long$Group == i], na.rm = T)[c("x", "y")])
results[[i]]$Group = i
}
results = do.call(rbind, results)
或使用dplyr
:
df %>%
nest_by(Group) %>%
mutate(dens = list(data.frame(density(data$Occurrence)[c("x", "y")]))) %>%
select(-data) %>%
unnest(cols = dens)
在所有情况下,我都从循环内部删除了 c(as.numeric())
。在循环之前确保整个 Occurrence
列都是数字 - 这比在循环内转换每一列要好。