将列表列值分离为相对于条件的奇异值
Separating list-col values into singular values relative to a condition
简化说明
从长转换为宽,同时将缺失值填充为 2019
的 17
和 2010
的 16
,而 2010
中的那些值匹配 2019
,然后减去他们的计划值(即 2019-2010)。如果没有 2019 年的值并且用 17
填充,则给该计划值一个 negative
值。同时,如果 16
填充了 2010
中的缺失值,则保留原计划值,positive
.
这应该看起来像 table 2.
Table 1:长格式数据帧示例
# A tibble: 10 x 4
year locality_id landcover pland
<chr> <chr> <int> <dbl>
1 2010 L452817 8 0.0968
2 2010 L452817 9 0.0323
3 2010 L452817 12 0.613
4 2010 L452817 13 0.194
5 2010 L452817 14 0.0645
6 2019 L452817 8 0.0645
7 2019 L452817 9 0.0645
8 2019 L452817 12 0.516
9 2019 L452817 13 0.194
10 2019 L452817 14 0.161
Table 2: table 2
的预期格式
locality_id X2010 X2019 pland
1 L452817 8 8 -0.03225806
2 L452817 9 9 0.03225807
3 L452817 12 12 -0.09677420
4 L452817 13 13 0.00000000
5 L452817 14 14 0.09677419
6 L910180 0 17 -0.43750000
7 L910180 8 17 -0.34375000
8 L910180 9 17 -0.03125000
9 L910180 10 17 -0.03125000
10 L910180 11 17 -0.09375000
11 L910180 13 17 -0.06250000
我尝试过的:
#set the values of t inot another variable
y <- t
#remove pland from the new variable
y <- y[, -4]
#set from long to wide providing the pland differences from t as another column
y %>%
group_by(year) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = year, values_from = landcover) %>%
select(-row) %>% mutate(across(`2010`:`2019`, ~if(cur_column() == '2019')
replace_na(.x, 17) else replace_na(.x, 16))) %>% mutate(t[t$year %in% 2019,]$pland - t[t$year %in% 2010,]$pland)
# A tibble: 11 x 4
locality_id `2010` `2019` `t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland`
<chr> <dbl> <dbl> <dbl>
1 L452817 8 8 -0.0323
2 L452817 9 9 0.0323
3 L452817 12 12 -0.0968
4 L452817 13 13 0
5 L452817 14 14 0.0968
6 L910180 0 17 -0.373
7 L910180 8 17 -0.279
8 L910180 9 17 0.485
9 L910180 10 17 0.162
10 L910180 11 17 0.0675
11 L910180 13 17 0.00202
我上面的代码的问题是,它总是计算差异,它不应该计算由于缺失值而引入的那些值的差异,所以当存在 16
或 17
两边。
我试过的资源:, and .
可重现代码:
structure(list(year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2019L, 2019L, 2019L, 2019L,
2019L), locality_id = c("L452817", "L452817", "L452817", "L452817",
"L452817", "L910180", "L910180", "L910180", "L910180", "L910180",
"L910180", "L452817", "L452817", "L452817", "L452817", "L452817"
), landcover = c(8L, 9L, 12L, 13L, 14L, 0L, 8L, 9L, 10L, 11L,
13L, 8L, 9L, 12L, 13L, 14L), pland = c(0.0967741935483871, 0.032258064516129,
0.612903225806452, 0.193548387096774, 0.0645161290322581, 0.4375,
0.34375, 0.03125, 0.03125, 0.09375, 0.0625, 0.0645161290322581,
0.0645161290322581, 0.516129032258065, 0.193548387096774, 0.161290322580645
)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"
))
设法弄明白了,尽管欢迎提出更好的建议,尤其是在没有警告的情况下!
#set the values of t inot another variable
y <- t
#remove pland from the new variable
y <- y[, -4]
#set from long to wide providing the pland differences from t as another column
y %>%
group_by(year) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = year, values_from = landcover) %>%
select(-row) %>%
mutate(across(`2010`:`2019`, ~if(cur_column() == '2019') replace_na(.x, 17) else replace_na(.x, 16))) %>%
mutate(ifelse(`2019` == `2010`, t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland, -t$pland))
Warning messages:
1: Problem with mutate()
input ..1
.
i longer object length is not a multiple of shorter object length
i Input ..1
is ifelse(...)
.
2: In t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland :
longer object length is not a multiple of shorter object length
# A tibble: 11 x 4
locality_id `2010` `2019` `ifelse(...)`
<chr> <dbl> <dbl> <dbl>
1 L452817 8 8 -0.0323
2 L452817 9 9 0.0323
3 L452817 12 12 -0.0968
4 L452817 13 13 0
5 L452817 14 14 0.0968
6 L910180 0 17 -0.438
7 L910180 8 17 -0.344
8 L910180 9 17 -0.0312
9 L910180 10 17 -0.0312
10 L910180 11 17 -0.0938
11 L910180 13 17 -0.0625
细分:
使用
中的代码建议
- 这将创建一个相对于分组列的
id
列,并为 group_by()
中的每个 unique
值重复
然后使用下一个代码,从
- 这会将
2010
中的 NAs
替换为 16
,将 2019
中的 2019
替换为 17
最后,ifelse()
语句,我悬而未决,认为它会起作用,它确实起作用了!
- 它选择分别等于
2019
和 2010
的土地覆盖值,然后通过减去这些值来获取它们的差值。最后,那些不相同的值用剩余的计划值填充,同时取负值。
但是 当 16
出现在 2010
中时,我还没有想出如何处理这些值,所以 2019
计划值仍然存在正,考虑到它总是设置为负!
我没有使用虚拟变量来识别缺失,而是使用 complete
的不同方法,其中 df
是您的原始数据结构。
df %>%
# fill in the data with missing year so we can compute while data in long format
complete(year, nesting(locality_id, landcover), fill = list(pland = 0)) %>%
arrange(desc(year)) %>%
group_by(locality_id, landcover) %>%
summarize(
X2010 = if_else(pland[year == 2010] == 0 , 16L, first(landcover)),
X2019 = if_else(pland[year == 2019] == 0 , 17L, first(landcover)),
pland = pland[year == 2019] - pland[year == 2010]) %>%
arrange(locality_id, landcover)
这是输出
locality_id landcover X2010 X2019 pland
<chr> <int> <int> <int> <dbl>
1 L452817 8 8 8 -0.0323
2 L452817 9 9 9 0.0323
3 L452817 12 12 12 -0.0968
4 L452817 13 13 13 0
5 L452817 14 14 14 0.0968
6 L910180 0 0 17 -0.438
7 L910180 8 8 17 -0.344
8 L910180 9 9 17 -0.0312
9 L910180 10 10 17 -0.0312
10 L910180 11 11 17 -0.0938
11 L910180 13 13 17 -0.0625
简化说明
从长转换为宽,同时将缺失值填充为 2019
的 17
和 2010
的 16
,而 2010
中的那些值匹配 2019
,然后减去他们的计划值(即 2019-2010)。如果没有 2019 年的值并且用 17
填充,则给该计划值一个 negative
值。同时,如果 16
填充了 2010
中的缺失值,则保留原计划值,positive
.
这应该看起来像 table 2.
Table 1:长格式数据帧示例
# A tibble: 10 x 4
year locality_id landcover pland
<chr> <chr> <int> <dbl>
1 2010 L452817 8 0.0968
2 2010 L452817 9 0.0323
3 2010 L452817 12 0.613
4 2010 L452817 13 0.194
5 2010 L452817 14 0.0645
6 2019 L452817 8 0.0645
7 2019 L452817 9 0.0645
8 2019 L452817 12 0.516
9 2019 L452817 13 0.194
10 2019 L452817 14 0.161
Table 2: table 2
的预期格式 locality_id X2010 X2019 pland
1 L452817 8 8 -0.03225806
2 L452817 9 9 0.03225807
3 L452817 12 12 -0.09677420
4 L452817 13 13 0.00000000
5 L452817 14 14 0.09677419
6 L910180 0 17 -0.43750000
7 L910180 8 17 -0.34375000
8 L910180 9 17 -0.03125000
9 L910180 10 17 -0.03125000
10 L910180 11 17 -0.09375000
11 L910180 13 17 -0.06250000
我尝试过的:
#set the values of t inot another variable
y <- t
#remove pland from the new variable
y <- y[, -4]
#set from long to wide providing the pland differences from t as another column
y %>%
group_by(year) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = year, values_from = landcover) %>%
select(-row) %>% mutate(across(`2010`:`2019`, ~if(cur_column() == '2019')
replace_na(.x, 17) else replace_na(.x, 16))) %>% mutate(t[t$year %in% 2019,]$pland - t[t$year %in% 2010,]$pland)
# A tibble: 11 x 4
locality_id `2010` `2019` `t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland`
<chr> <dbl> <dbl> <dbl>
1 L452817 8 8 -0.0323
2 L452817 9 9 0.0323
3 L452817 12 12 -0.0968
4 L452817 13 13 0
5 L452817 14 14 0.0968
6 L910180 0 17 -0.373
7 L910180 8 17 -0.279
8 L910180 9 17 0.485
9 L910180 10 17 0.162
10 L910180 11 17 0.0675
11 L910180 13 17 0.00202
我上面的代码的问题是,它总是计算差异,它不应该计算由于缺失值而引入的那些值的差异,所以当存在 16
或 17
两边。
我试过的资源:
可重现代码:
structure(list(year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2019L, 2019L, 2019L, 2019L,
2019L), locality_id = c("L452817", "L452817", "L452817", "L452817",
"L452817", "L910180", "L910180", "L910180", "L910180", "L910180",
"L910180", "L452817", "L452817", "L452817", "L452817", "L452817"
), landcover = c(8L, 9L, 12L, 13L, 14L, 0L, 8L, 9L, 10L, 11L,
13L, 8L, 9L, 12L, 13L, 14L), pland = c(0.0967741935483871, 0.032258064516129,
0.612903225806452, 0.193548387096774, 0.0645161290322581, 0.4375,
0.34375, 0.03125, 0.03125, 0.09375, 0.0625, 0.0645161290322581,
0.0645161290322581, 0.516129032258065, 0.193548387096774, 0.161290322580645
)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"
))
设法弄明白了,尽管欢迎提出更好的建议,尤其是在没有警告的情况下!
#set the values of t inot another variable
y <- t
#remove pland from the new variable
y <- y[, -4]
#set from long to wide providing the pland differences from t as another column
y %>%
group_by(year) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = year, values_from = landcover) %>%
select(-row) %>%
mutate(across(`2010`:`2019`, ~if(cur_column() == '2019') replace_na(.x, 17) else replace_na(.x, 16))) %>%
mutate(ifelse(`2019` == `2010`, t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland, -t$pland))
Warning messages: 1: Problem with
mutate()
input..1
.
i longer object length is not a multiple of shorter object length
i Input..1
isifelse(...)
.
2: In t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland :
longer object length is not a multiple of shorter object length
# A tibble: 11 x 4
locality_id `2010` `2019` `ifelse(...)`
<chr> <dbl> <dbl> <dbl>
1 L452817 8 8 -0.0323
2 L452817 9 9 0.0323
3 L452817 12 12 -0.0968
4 L452817 13 13 0
5 L452817 14 14 0.0968
6 L910180 0 17 -0.438
7 L910180 8 17 -0.344
8 L910180 9 17 -0.0312
9 L910180 10 17 -0.0312
10 L910180 11 17 -0.0938
11 L910180 13 17 -0.0625
细分:
使用
- 这将创建一个相对于分组列的
id
列,并为group_by()
中的每个
unique
值重复
然后使用下一个代码,从
- 这会将
2010
中的NAs
替换为16
,将2019
中的2019
替换为17
最后,ifelse()
语句,我悬而未决,认为它会起作用,它确实起作用了!
- 它选择分别等于
2019
和2010
的土地覆盖值,然后通过减去这些值来获取它们的差值。最后,那些不相同的值用剩余的计划值填充,同时取负值。
但是 当 16
出现在 2010
中时,我还没有想出如何处理这些值,所以 2019
计划值仍然存在正,考虑到它总是设置为负!
我没有使用虚拟变量来识别缺失,而是使用 complete
的不同方法,其中 df
是您的原始数据结构。
df %>%
# fill in the data with missing year so we can compute while data in long format
complete(year, nesting(locality_id, landcover), fill = list(pland = 0)) %>%
arrange(desc(year)) %>%
group_by(locality_id, landcover) %>%
summarize(
X2010 = if_else(pland[year == 2010] == 0 , 16L, first(landcover)),
X2019 = if_else(pland[year == 2019] == 0 , 17L, first(landcover)),
pland = pland[year == 2019] - pland[year == 2010]) %>%
arrange(locality_id, landcover)
这是输出
locality_id landcover X2010 X2019 pland
<chr> <int> <int> <int> <dbl>
1 L452817 8 8 8 -0.0323
2 L452817 9 9 9 0.0323
3 L452817 12 12 12 -0.0968
4 L452817 13 13 13 0
5 L452817 14 14 14 0.0968
6 L910180 0 0 17 -0.438
7 L910180 8 8 17 -0.344
8 L910180 9 9 17 -0.0312
9 L910180 10 10 17 -0.0312
10 L910180 11 11 17 -0.0938
11 L910180 13 13 17 -0.0625