根据两个分组条件计算中位日期
Calculate the median date based on two grouping conditions
我有以下数据框:
> head(df)
# A tibble: 6 x 6
# Groups: lat, decade [2]
lat long date year decade month_day
<dbl> <dbl> <date> <chr> <chr> <chr>
1 55 18 1952-02-03 1952 1950-1959 02-03
2 55 18 1958-02-08 1958 1950-1959 02-08
3 55 18 1958-02-08 1958 1950-1959 02-08
4 55 18 1958-02-08 1958 1950-1959 02-08
5 55 18 1965-02-07 1965 1960-1969 02-07
6 55 18 1966-03-03 1966 1960-1969 03-03
> summary(df)
lat long date year decade
Min. :55.00 Min. :18 Min. :1951-03-22 Length:1414 Length:1414
1st Qu.:56.00 1st Qu.:18 1st Qu.:1987-01-01 Class :character Class :character
Median :58.00 Median :18 Median :2004-04-02 Mode :character Mode :character
Mean :59.07 Mean :18 Mean :1999-02-16
3rd Qu.:62.00 3rd Qu.:18 3rd Qu.:2014-01-01
Max. :68.00 Max. :18 Max. :2021-03-28
month_day
Length:1414
Class :character
Mode :character
我想根据纬度 (lat
) 和 根据 decade
获得中位数 month_day
我试过了,但无法通过错误:
df = df %>%
group_by(lat, decade) %>%
summarise(across(month_day, median)) %>%
ungroup
Error in `summarise()`:
! Problem while computing `..1 = across(month_day, median)`.
Caused by error:
! `month_day` must return compatible vectors across groups.
i Result type for group 1 (lat = 55, decade = "1950-1959"): <double>.
i Result type for group 2 (lat = 55, decade = "1960-1969"): <character>.
不知道怎么解决,非常感谢您的帮助。
编辑:
> ds_filtered_median[ds_filtered_median$lat == '57', ]
# A tibble: 124 x 6
lat long date year decade month_day
<dbl> <dbl> <date> <chr> <chr> <chr>
1 57 18 1955-04-08 1955 1950-1959 04-08
2 57 18 1957-02-19 1957 1950-1959 02-19
3 57 18 1958-04-06 1958 1950-1959 04-06
4 57 18 1959-01-01 1959 1950-1959 01-01
5 57 18 1960-01-03 1960 1960-1969 01-03
6 57 18 1961-01-02 1961 1960-1969 01-02
7 57 18 1962-01-02 1962 1960-1969 01-02
8 57 18 1963-01-01 1963 1960-1969 01-01
9 57 18 1964-01-19 1964 1960-1969 01-19
10 57 18 1965-01-12 1965 1960-1969 01-12
# ... with 114 more rows
您必须将 month_day
转换为数值才能获得中位数。 across
只有在单独为多个列计算某些内容时才需要,例如使用 data %>% summarise(across(any_of(c("lat", "long")), median))
获得中位数 lon
和 lat
library(tidyverse)
data <- tribble(
~lat, ~long, ~date, ~year, ~decade, ~month_day,
55, 18, "1952-02-03", 1952, "1950-1959", "02-03",
55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
55, 18, "1965-02-07", 1965, "1960-1969", "02-07",
55, 18, "1966-03-03", 1966, "1960-1969", "03-03"
)
data %>%
mutate(
month_day_num = month_day %>% str_extract("[0-9]+$") %>% as.numeric()
) %>%
group_by(lat, decade) %>%
summarise(
median_month_day = median(month_day_num)
)
#> `summarise()` has grouped output by 'lat'. You can override using the `.groups`
#> argument.
#> # A tibble: 2 × 3
#> # Groups: lat [1]
#> lat decade median_month_day
#> <dbl> <chr> <dbl>
#> 1 55 1950-1959 8
#> 2 55 1960-1969 5
由 reprex package (v2.0.0)
于 2022-04-05 创建
您可以将日期转换为自年初以来的天数。从这个数字你可以很容易地计算出你的中位数。然后将您的日期转换为一月的任何一天作为参考。不过,您可以给我一个闰年...对于日期操作,我使用了 lubridate。
library(lubridate)
data %>%
mutate(
date = ymd(date),
days_since_january = as.numeric(date - ymd(paste(year(date), 1, 1, sep = "-")))
) %>%
group_by(lat, decade) %>%
summarise(across(days_since_january, median), .groups = "keep") %>%
mutate(median_month_date = format(ymd("1960-01-01") + days(floor(days_since_january)), "%m-%d"))
# A tibble: 2 x 4
# Groups: lat, decade [2]
lat decade days_since_january median_month_date
<dbl> <chr> <dbl> <chr>
1 55 1950-1959 38 02-08
2 55 1960-1969 49 02-19
# A tibble: 2 x 4
# Groups: lat, decade [2]
lat decade days_since_january median_month_date
<int> <chr> <dbl> <chr>
1 57 1950-1959 72 03-13
2 57 1960-1969 1.5 01-02
我有以下数据框:
> head(df)
# A tibble: 6 x 6
# Groups: lat, decade [2]
lat long date year decade month_day
<dbl> <dbl> <date> <chr> <chr> <chr>
1 55 18 1952-02-03 1952 1950-1959 02-03
2 55 18 1958-02-08 1958 1950-1959 02-08
3 55 18 1958-02-08 1958 1950-1959 02-08
4 55 18 1958-02-08 1958 1950-1959 02-08
5 55 18 1965-02-07 1965 1960-1969 02-07
6 55 18 1966-03-03 1966 1960-1969 03-03
> summary(df)
lat long date year decade
Min. :55.00 Min. :18 Min. :1951-03-22 Length:1414 Length:1414
1st Qu.:56.00 1st Qu.:18 1st Qu.:1987-01-01 Class :character Class :character
Median :58.00 Median :18 Median :2004-04-02 Mode :character Mode :character
Mean :59.07 Mean :18 Mean :1999-02-16
3rd Qu.:62.00 3rd Qu.:18 3rd Qu.:2014-01-01
Max. :68.00 Max. :18 Max. :2021-03-28
month_day
Length:1414
Class :character
Mode :character
我想根据纬度 (lat
) 和 根据 decade
month_day
我试过了,但无法通过错误:
df = df %>%
group_by(lat, decade) %>%
summarise(across(month_day, median)) %>%
ungroup
Error in `summarise()`:
! Problem while computing `..1 = across(month_day, median)`.
Caused by error:
! `month_day` must return compatible vectors across groups.
i Result type for group 1 (lat = 55, decade = "1950-1959"): <double>.
i Result type for group 2 (lat = 55, decade = "1960-1969"): <character>.
不知道怎么解决,非常感谢您的帮助。
编辑:
> ds_filtered_median[ds_filtered_median$lat == '57', ]
# A tibble: 124 x 6
lat long date year decade month_day
<dbl> <dbl> <date> <chr> <chr> <chr>
1 57 18 1955-04-08 1955 1950-1959 04-08
2 57 18 1957-02-19 1957 1950-1959 02-19
3 57 18 1958-04-06 1958 1950-1959 04-06
4 57 18 1959-01-01 1959 1950-1959 01-01
5 57 18 1960-01-03 1960 1960-1969 01-03
6 57 18 1961-01-02 1961 1960-1969 01-02
7 57 18 1962-01-02 1962 1960-1969 01-02
8 57 18 1963-01-01 1963 1960-1969 01-01
9 57 18 1964-01-19 1964 1960-1969 01-19
10 57 18 1965-01-12 1965 1960-1969 01-12
# ... with 114 more rows
您必须将 month_day
转换为数值才能获得中位数。 across
只有在单独为多个列计算某些内容时才需要,例如使用 data %>% summarise(across(any_of(c("lat", "long")), median))
lon
和 lat
library(tidyverse)
data <- tribble(
~lat, ~long, ~date, ~year, ~decade, ~month_day,
55, 18, "1952-02-03", 1952, "1950-1959", "02-03",
55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
55, 18, "1965-02-07", 1965, "1960-1969", "02-07",
55, 18, "1966-03-03", 1966, "1960-1969", "03-03"
)
data %>%
mutate(
month_day_num = month_day %>% str_extract("[0-9]+$") %>% as.numeric()
) %>%
group_by(lat, decade) %>%
summarise(
median_month_day = median(month_day_num)
)
#> `summarise()` has grouped output by 'lat'. You can override using the `.groups`
#> argument.
#> # A tibble: 2 × 3
#> # Groups: lat [1]
#> lat decade median_month_day
#> <dbl> <chr> <dbl>
#> 1 55 1950-1959 8
#> 2 55 1960-1969 5
由 reprex package (v2.0.0)
于 2022-04-05 创建您可以将日期转换为自年初以来的天数。从这个数字你可以很容易地计算出你的中位数。然后将您的日期转换为一月的任何一天作为参考。不过,您可以给我一个闰年...对于日期操作,我使用了 lubridate。
library(lubridate)
data %>%
mutate(
date = ymd(date),
days_since_january = as.numeric(date - ymd(paste(year(date), 1, 1, sep = "-")))
) %>%
group_by(lat, decade) %>%
summarise(across(days_since_january, median), .groups = "keep") %>%
mutate(median_month_date = format(ymd("1960-01-01") + days(floor(days_since_january)), "%m-%d"))
# A tibble: 2 x 4
# Groups: lat, decade [2]
lat decade days_since_january median_month_date
<dbl> <chr> <dbl> <chr>
1 55 1950-1959 38 02-08
2 55 1960-1969 49 02-19
# A tibble: 2 x 4
# Groups: lat, decade [2]
lat decade days_since_january median_month_date
<int> <chr> <dbl> <chr>
1 57 1950-1959 72 03-13
2 57 1960-1969 1.5 01-02