完成列中的缺失值或如何从单个变量在给定范围内创建值
Complete missing values in a column or how to create values between a given range from a single variable
我有一个沿海拔梯度分布的物种列表,但只有上限和下限。我想补全极值间的缺失信息,这样可以绘制物种分布图。
species
range
elevation
spp1
upper
1100
spp1
lower
100
spp2
upper
20
spp2
lower
1200
spp3
upper
NA
spp3
lower
900
spp4
upper
500
spp4
lower
0
spp5
upper
NA
spp5
lower
900
*当该物种只有一个数据可用时,海拔 0 表示海平面和 NA
我尝试使用 pivot_wider 然后回到长版本,但我最好的是使用函数 complete()
df %>%
complete(spp, elevation= seq(0,3500,100), fill = list(Value = NA))
对于每个物种序列,我假设海拔范围从 0 到 3500 m.asl,并且每隔 100 米海拔填充物种的存在。这行得通,但我失去了几个物种。怎么了?
您可以扩大旋转范围,让每个物种都在自己的行中,然后按物种分组,然后使用 multi-group 总结:
library(tidyr)
library(dplyr)
df <- read.csv("listSpp.csv")
df %>%
mutate(spp = factor(spp, unique(df$spp))) %>%
pivot_wider(names_from = limits, values_from = elevation, values_fn = list) %>%
unnest(cols = c(lower, upper)) %>%
mutate(lower = ifelse(is.na(lower), upper, lower),
upper = ifelse(is.na(upper), lower, upper)) %>%
group_by(spp) %>%
summarise(Elevation = seq(0, 3500, 100),
present = lower <= seq(0, 3500, 100) &
upper >= seq(0, 3500, 100),
.groups = "drop")
#> # A tibble: 147,096 x 3
#> spp Elevation present
#> <fct> <dbl> <lgl>
#> 1 Isoetes araucaniana Macluf & Hickey 0 FALSE
#> 2 Isoetes araucaniana Macluf & Hickey 100 FALSE
#> 3 Isoetes araucaniana Macluf & Hickey 200 FALSE
#> 4 Isoetes araucaniana Macluf & Hickey 300 FALSE
#> 5 Isoetes araucaniana Macluf & Hickey 400 FALSE
#> 6 Isoetes araucaniana Macluf & Hickey 500 FALSE
#> 7 Isoetes araucaniana Macluf & Hickey 600 FALSE
#> 8 Isoetes araucaniana Macluf & Hickey 700 FALSE
#> 9 Isoetes araucaniana Macluf & Hickey 800 TRUE
#> 10 Isoetes araucaniana Macluf & Hickey 900 TRUE
#> # ... with 147,086 more rows
我有一个沿海拔梯度分布的物种列表,但只有上限和下限。我想补全极值间的缺失信息,这样可以绘制物种分布图。
species | range | elevation |
---|---|---|
spp1 | upper | 1100 |
spp1 | lower | 100 |
spp2 | upper | 20 |
spp2 | lower | 1200 |
spp3 | upper | NA |
spp3 | lower | 900 |
spp4 | upper | 500 |
spp4 | lower | 0 |
spp5 | upper | NA |
spp5 | lower | 900 |
*当该物种只有一个数据可用时,海拔 0 表示海平面和 NA
我尝试使用 pivot_wider 然后回到长版本,但我最好的是使用函数 complete()
df %>%
complete(spp, elevation= seq(0,3500,100), fill = list(Value = NA))
对于每个物种序列,我假设海拔范围从 0 到 3500 m.asl,并且每隔 100 米海拔填充物种的存在。这行得通,但我失去了几个物种。怎么了?
您可以扩大旋转范围,让每个物种都在自己的行中,然后按物种分组,然后使用 multi-group 总结:
library(tidyr)
library(dplyr)
df <- read.csv("listSpp.csv")
df %>%
mutate(spp = factor(spp, unique(df$spp))) %>%
pivot_wider(names_from = limits, values_from = elevation, values_fn = list) %>%
unnest(cols = c(lower, upper)) %>%
mutate(lower = ifelse(is.na(lower), upper, lower),
upper = ifelse(is.na(upper), lower, upper)) %>%
group_by(spp) %>%
summarise(Elevation = seq(0, 3500, 100),
present = lower <= seq(0, 3500, 100) &
upper >= seq(0, 3500, 100),
.groups = "drop")
#> # A tibble: 147,096 x 3
#> spp Elevation present
#> <fct> <dbl> <lgl>
#> 1 Isoetes araucaniana Macluf & Hickey 0 FALSE
#> 2 Isoetes araucaniana Macluf & Hickey 100 FALSE
#> 3 Isoetes araucaniana Macluf & Hickey 200 FALSE
#> 4 Isoetes araucaniana Macluf & Hickey 300 FALSE
#> 5 Isoetes araucaniana Macluf & Hickey 400 FALSE
#> 6 Isoetes araucaniana Macluf & Hickey 500 FALSE
#> 7 Isoetes araucaniana Macluf & Hickey 600 FALSE
#> 8 Isoetes araucaniana Macluf & Hickey 700 FALSE
#> 9 Isoetes araucaniana Macluf & Hickey 800 TRUE
#> 10 Isoetes araucaniana Macluf & Hickey 900 TRUE
#> # ... with 147,086 more rows