完成列中的缺失值或如何从单个变量在给定范围内创建值

Complete missing values in a column or how to create values between a given range from a single variable

我有一个沿海拔梯度分布的物种列表,但只有上限和下限。我想补全极值间的缺失信息,这样可以绘制物种分布图。

here are the data

species range elevation
spp1 upper 1100
spp1 lower 100
spp2 upper 20
spp2 lower 1200
spp3 upper NA
spp3 lower 900
spp4 upper 500
spp4 lower 0
spp5 upper NA
spp5 lower 900

*当该物种只有一个数据可用时,海拔 0 表示海平面和 NA

我尝试使用 pivot_wider 然后回到长版本,但我最好的是使用函数 complete()

df %>%
  complete(spp, elevation= seq(0,3500,100), fill = list(Value = NA))  

对于每个物种序列,我假设海拔范围从 0 到 3500 m.asl,并且每隔 100 米海拔填充物种的存在。这行得通,但我失去了几个物种。怎么了?

您可以扩大旋转范围,让每个物种都在自己的行中,然后按物种分组,然后使用 multi-group 总结:

library(tidyr)
library(dplyr)

df <- read.csv("listSpp.csv")

df %>% 
  mutate(spp = factor(spp, unique(df$spp))) %>%
  pivot_wider(names_from = limits, values_from = elevation, values_fn = list) %>% 
  unnest(cols = c(lower, upper)) %>%
  mutate(lower = ifelse(is.na(lower), upper, lower),
         upper = ifelse(is.na(upper), lower, upper)) %>%
  group_by(spp) %>%
  summarise(Elevation = seq(0, 3500, 100),
            present = lower <= seq(0, 3500, 100) &
                      upper >= seq(0, 3500, 100),
            .groups = "drop")
#> # A tibble: 147,096 x 3
#>    spp                                 Elevation present
#>    <fct>                                   <dbl> <lgl>  
#>  1 Isoetes araucaniana Macluf & Hickey         0 FALSE  
#>  2 Isoetes araucaniana Macluf & Hickey       100 FALSE  
#>  3 Isoetes araucaniana Macluf & Hickey       200 FALSE  
#>  4 Isoetes araucaniana Macluf & Hickey       300 FALSE  
#>  5 Isoetes araucaniana Macluf & Hickey       400 FALSE  
#>  6 Isoetes araucaniana Macluf & Hickey       500 FALSE  
#>  7 Isoetes araucaniana Macluf & Hickey       600 FALSE  
#>  8 Isoetes araucaniana Macluf & Hickey       700 FALSE  
#>  9 Isoetes araucaniana Macluf & Hickey       800 TRUE   
#> 10 Isoetes araucaniana Macluf & Hickey       900 TRUE   
#> # ... with 147,086 more rows