变异以合并来自R中不同df的最小值
mutate to merge minimum value from different df in R
我有两个数据集:一个是我研究中的物种以及我观察它们的次数,另一个更大的数据集是更广泛的观察数据库。
我想根据另一个数据集中的值将我的短数据集中的一列变异为“观察到的最低纬度”(或最高或平均值),但我不太清楚如何在变异中匹配它们。
set.seed(1)
# my dataset. sightings isn't important for this, just important that the solution doesn't mess up existing columns.
fake_spp_df <- data.frame(
species = c("a","b","c","d",'e'),
sightings = c(5,1,2,6,3)
)
# broader occurrence dataset
fake_spp_occurrences <- data.frame(
species = rep(c("a","b","c","d",'f'),each=20), # notice spp "f" - not all species are the same between datasets
latitude = runif(100, min = 0, max = 80),
longitude = runif(100, min=-90, max = -55)
)
# so I know to find one species min, i could do this:
min(fake_spp_occurrences$latitude[fake_spp_occurrences$species == "a"]),
# but I want to do that in a mutate()
# this was my failed attempt:
fake_spp_df %>%
mutate(lowest_lat = min(fake_spp_occurrences$latitude[fake_spp_occurrences$species == species])
)
期望的结果:
> fake_spp_df
species sightings lowest_lat max_lat median_lat
1 a 5 1.7 etc...
2 b 1 5.3
3 c 2 2.2
4 d 6 4.3
5 e 3 NA
认为这也可以通过某种连接或合并来完成,但我不确定。
谢谢!
summarise
fake_spp_occurrences
数据集,然后执行连接。
library(dplyr)
fake_spp_occurrences %>%
group_by(species) %>%
summarise(lowest_lat = min(latitude),
max_lat = max(latitude),
median_lat = median(latitude)) %>%
right_join(fake_spp_df, by = 'species')
# species lowest_lat max_lat median_lat sightings
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 a 4.94 79.4 48.1 5
#2 b 1.07 74.8 35.7 1
#3 c 1.87 68.9 41.9 2
#4 d 6.74 76.8 38.2 6
#5 e NA NA NA 3
我有两个数据集:一个是我研究中的物种以及我观察它们的次数,另一个更大的数据集是更广泛的观察数据库。
我想根据另一个数据集中的值将我的短数据集中的一列变异为“观察到的最低纬度”(或最高或平均值),但我不太清楚如何在变异中匹配它们。
set.seed(1)
# my dataset. sightings isn't important for this, just important that the solution doesn't mess up existing columns.
fake_spp_df <- data.frame(
species = c("a","b","c","d",'e'),
sightings = c(5,1,2,6,3)
)
# broader occurrence dataset
fake_spp_occurrences <- data.frame(
species = rep(c("a","b","c","d",'f'),each=20), # notice spp "f" - not all species are the same between datasets
latitude = runif(100, min = 0, max = 80),
longitude = runif(100, min=-90, max = -55)
)
# so I know to find one species min, i could do this:
min(fake_spp_occurrences$latitude[fake_spp_occurrences$species == "a"]),
# but I want to do that in a mutate()
# this was my failed attempt:
fake_spp_df %>%
mutate(lowest_lat = min(fake_spp_occurrences$latitude[fake_spp_occurrences$species == species])
)
期望的结果:
> fake_spp_df
species sightings lowest_lat max_lat median_lat
1 a 5 1.7 etc...
2 b 1 5.3
3 c 2 2.2
4 d 6 4.3
5 e 3 NA
认为这也可以通过某种连接或合并来完成,但我不确定。
谢谢!
summarise
fake_spp_occurrences
数据集,然后执行连接。
library(dplyr)
fake_spp_occurrences %>%
group_by(species) %>%
summarise(lowest_lat = min(latitude),
max_lat = max(latitude),
median_lat = median(latitude)) %>%
right_join(fake_spp_df, by = 'species')
# species lowest_lat max_lat median_lat sightings
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 a 4.94 79.4 48.1 5
#2 b 1.07 74.8 35.7 1
#3 c 1.87 68.9 41.9 2
#4 d 6.74 76.8 38.2 6
#5 e NA NA NA 3