将列表列值分离为相对于条件的奇异值

Separating list-col values into singular values relative to a condition

简化说明

从长转换为宽,同时将缺失值填充为 201917201016,而 2010 中的那些值匹配 2019,然后减去他们的计划值(即 2019-2010)。如果没有 2019 年的值并且用 17 填充,则给该计划值一个 negative 值。同时,如果 16 填充了 2010 中的缺失值,则保留原计划值,positive.

这应该看起来像 table 2.

Table 1:长格式数据帧示例

# A tibble: 10 x 4
   year  locality_id landcover  pland
   <chr> <chr>           <int>  <dbl>
 1 2010  L452817             8 0.0968
 2 2010  L452817             9 0.0323
 3 2010  L452817            12 0.613 
 4 2010  L452817            13 0.194 
 5 2010  L452817            14 0.0645
 6 2019  L452817             8 0.0645
 7 2019  L452817             9 0.0645
 8 2019  L452817            12 0.516 
 9 2019  L452817            13 0.194 
10 2019  L452817            14 0.161 

Table 2: table 2

的预期格式
   locality_id X2010 X2019       pland
1      L452817     8     8 -0.03225806
2      L452817     9     9  0.03225807
3      L452817    12    12 -0.09677420
4      L452817    13    13  0.00000000
5      L452817    14    14  0.09677419
6      L910180     0    17 -0.43750000
7      L910180     8    17 -0.34375000
8      L910180     9    17 -0.03125000
9      L910180    10    17 -0.03125000
10     L910180    11    17 -0.09375000
11     L910180    13    17 -0.06250000

我尝试过的:

#set the values of t inot another variable
y <- t
#remove pland from the new variable
y <- y[, -4]

#set from long to wide providing the pland differences from t as another column
y %>%
    group_by(year) %>%
    mutate(row = row_number()) %>%
    tidyr::pivot_wider(names_from = year, values_from = landcover) %>%
    select(-row) %>% mutate(across(`2010`:`2019`, ~if(cur_column() == '2019') 
        replace_na(.x, 17) else replace_na(.x, 16))) %>% mutate(t[t$year %in% 2019,]$pland - t[t$year %in% 2010,]$pland)

# A tibble: 11 x 4
   locality_id `2010` `2019` `t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland`
   <chr>        <dbl>  <dbl>                                                       <dbl>
 1 L452817          8      8                                                    -0.0323 
 2 L452817          9      9                                                     0.0323 
 3 L452817         12     12                                                    -0.0968 
 4 L452817         13     13                                                     0      
 5 L452817         14     14                                                     0.0968 
 6 L910180          0     17                                                    -0.373  
 7 L910180          8     17                                                    -0.279  
 8 L910180          9     17                                                     0.485  
 9 L910180         10     17                                                     0.162  
10 L910180         11     17                                                     0.0675 
11 L910180         13     17                                                     0.00202

我上面的代码的问题是,它总是计算差异,它不应该计算由于缺失值而引入的那些值的差异,所以当存在 1617 两边。

我试过的资源:, and .

可重现代码:

structure(list(year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 
2010L, 2010L, 2010L, 2010L, 2010L, 2019L, 2019L, 2019L, 2019L, 
2019L), locality_id = c("L452817", "L452817", "L452817", "L452817", 
"L452817", "L910180", "L910180", "L910180", "L910180", "L910180", 
"L910180", "L452817", "L452817", "L452817", "L452817", "L452817"
), landcover = c(8L, 9L, 12L, 13L, 14L, 0L, 8L, 9L, 10L, 11L, 
13L, 8L, 9L, 12L, 13L, 14L), pland = c(0.0967741935483871, 0.032258064516129, 
0.612903225806452, 0.193548387096774, 0.0645161290322581, 0.4375, 
0.34375, 0.03125, 0.03125, 0.09375, 0.0625, 0.0645161290322581, 
0.0645161290322581, 0.516129032258065, 0.193548387096774, 0.161290322580645
)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"
))

设法弄明白了,尽管欢迎提出更好的建议,尤其是在没有警告的情况下!

#set the values of t inot another variable
y <- t
#remove pland from the new variable
y <- y[, -4]

#set from long to wide providing the pland differences from t as another column
y %>%
group_by(year) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = year, values_from = landcover) %>%
select(-row) %>% 
mutate(across(`2010`:`2019`, ~if(cur_column() == '2019') replace_na(.x, 17) else replace_na(.x, 16))) %>% 
mutate(ifelse(`2019` == `2010`, t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland, -t$pland))

Warning messages: 1: Problem with mutate() input ..1.
i longer object length is not a multiple of shorter object length
i Input ..1 is ifelse(...).
2: In t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland :
longer object length is not a multiple of shorter object length

# A tibble: 11 x 4
   locality_id `2010` `2019` `ifelse(...)`
   <chr>        <dbl>  <dbl>         <dbl>
 1 L452817          8      8       -0.0323
 2 L452817          9      9        0.0323
 3 L452817         12     12       -0.0968
 4 L452817         13     13        0     
 5 L452817         14     14        0.0968
 6 L910180          0     17       -0.438 
 7 L910180          8     17       -0.344 
 8 L910180          9     17       -0.0312
 9 L910180         10     17       -0.0312
10 L910180         11     17       -0.0938
11 L910180         13     17       -0.0625

细分:

使用

中的代码建议
  • 这将创建一个相对于分组列的 id 列,并为 group_by()
  • 中的每个 unique 值重复

然后使用下一个代码,从

  • 这会将 2010 中的 NAs 替换为 16,将 2019 中的 2019 替换为 17

最后,ifelse() 语句,我悬而未决,认为它会起作用,它确实起作用了!

  • 它选择分别等于 20192010 的土地覆盖值,然后通过减去这些值来获取它们的差值。最后,那些不相同的值用剩余的计划值填充,同时取负值。

但是16 出现在 2010 中时,我还没有想出如何处理这些值,所以 2019 计划值仍然存在正,考虑到它总是设置为负!

我没有使用虚拟变量来识别缺失,而是使用 complete 的不同方法,其中 df 是您的原始数据结构。

df %>%
  # fill in the data with missing year so we can compute while data in long format
  complete(year, nesting(locality_id, landcover), fill = list(pland = 0)) %>%
  arrange(desc(year)) %>%
  group_by(locality_id, landcover) %>%
  summarize(
    X2010 = if_else(pland[year == 2010] == 0 , 16L, first(landcover)),
    X2019 = if_else(pland[year == 2019] == 0 , 17L, first(landcover)),
    pland  = pland[year == 2019] - pland[year == 2010]) %>%
  arrange(locality_id, landcover)

这是输出

   locality_id landcover X2010 X2019   pland
   <chr>           <int> <int> <int>   <dbl>
 1 L452817             8     8     8 -0.0323
 2 L452817             9     9     9  0.0323
 3 L452817            12    12    12 -0.0968
 4 L452817            13    13    13  0     
 5 L452817            14    14    14  0.0968
 6 L910180             0     0    17 -0.438 
 7 L910180             8     8    17 -0.344 
 8 L910180             9     9    17 -0.0312
 9 L910180            10    10    17 -0.0312
10 L910180            11    11    17 -0.0938
11 L910180            13    13    17 -0.0625