跨多列的序列中的行均值
Row mean in a sequence across multiple columns
我有 2017 年到 2019 年和 2022 年各州失业率的月度数据。我想获得每个州的年平均失业率。
有没有办法使用序列或 for 循环来获取从第 2 列开始的每 12 列的平均值?那么对于 2022 年,它只会是三列(1 月到 3 月)的平均值
我目前有以下内容,但这非常低效。尤其是当我开始处理更大的数据集时。
编辑
# Edit: Below shows unemployment rates for year 2017 (not showing 2018-2022)
> df[2:13]
Jan.2017 Feb.2017 Mar.2017 Apr.2017 May.2017 Jun.2017 Jul.2017 Aug.2017 Sep.2017 Oct.2017 Nov.2017 Dec.2017
1 5.5 5.2 5.0 4.8 4.6 4.4 4.3 4.2 4.1 4.0 4.0 4.0
2 6.6 6.6 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.4
3 5.2 5.2 5.1 5.0 5.0 4.9 4.9 4.8 4.9 4.9 4.9 4.9
4 3.8 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.8 3.8 3.8 3.8
# I am using the below to get the average from Jan2017-Dec2017, Jan 2018-Dec 2018 etc.
df$x2017 <- rowMeans(df[ , c(2:13)], na.rm=TRUE)
df$x2018 <- rowMeans(df[ , c(14:25)], na.rm=TRUE)
df$x2019 <- rowMeans(df[ , c(26:37)], na.rm=TRUE)
df$x2021 <- rowMeans(df[ , c(38:49)], na.rm=TRUE)
df$x2022 <- rowMeans(df[ , c(50:52)], na.rm=TRUE)
# output
State x2017 x2018 x2019 x2021 x2022
1 Alabama 8.0 7.2 6.6 6.1 5.9
2 Alaska 7.2 7.0 6.6 6.3 6.5
3 Arizona 8.3 7.7 6.7 6.0 5.6
4 Arkansas 7.2 6.9 5.7 4.7 4.0
我只是在寻找可以减少错误可能性的方法,告诉它要获取哪些列的平均值。
如有需要,补上df
state = c("Alabama", "Alaska", "Arizona")
Jan2017 = c(1:3)
Feb2017 = c(4:6)
Jan2018 = c(7:9)
Feb2018 = c(10:12)
Jan2019 = c(13:15)
Feb2019 = c(16:18)
df3=data.frame(state,Jan2017,Feb2017,Jan2018,Feb2018,Jan2019,Feb2019)
> df3
state Jan2017 Feb2017 Jan2018 Feb2018 Jan2019 Feb2019
1 Alabama 1 4 7 10 13 16
2 Alaska 2 5 8 11 14 17
3 Arizona 3 6 9 12 15 18
这是一个 tidyverse
解决方案,其中包含旋转和汇总:
library(dplyr)
library(tidyr)
df3 %>%
pivot_longer(-state) %>%
mutate(helper = parse_number(name)) %>%
group_by(state, helper) %>%
mutate(mean = mean(value, na.rm=TRUE)) %>%
pivot_wider(names_from = helper,
values_from = mean) %>%
group_by(state) %>%
summarise(across(-c(name, value), mean, na.rm = TRUE), .groups = 'drop')
state `2017` `2018` `2019`
<chr> <dbl> <dbl> <dbl>
1 Alabama 2.5 8.5 14.5
2 Alaska 3.5 9.5 15.5
3 Arizona 4.5 10.5 16.5
cbind(df3[1], sapply(split.default(df3[-1], sub("\D+", "", names(df3)[-1])), rowMeans))
state 2017 2018 2019
1 Alabama 2.5 8.5 14.5
2 Alaska 3.5 9.5 15.5
3 Arizona 4.5 10.5 16.5
这是另一个略有不同的 tidyverse
选项:
library(tidyverse)
df3 %>%
pivot_longer(
cols = -state,
names_to = c(NA, ".value"),
names_pattern = "(.*)(\d{4})"
) %>%
group_by(state) %>%
summarize(across(everything(), mean, na.rm = TRUE))
输出
state `2017` `2018` `2019`
<chr> <dbl> <dbl> <dbl>
1 Alabama 2.5 8.5 14.5
2 Alaska 3.5 9.5 15.5
3 Arizona 4.5 10.5 16.5
我有 2017 年到 2019 年和 2022 年各州失业率的月度数据。我想获得每个州的年平均失业率。
有没有办法使用序列或 for 循环来获取从第 2 列开始的每 12 列的平均值?那么对于 2022 年,它只会是三列(1 月到 3 月)的平均值
我目前有以下内容,但这非常低效。尤其是当我开始处理更大的数据集时。
编辑
# Edit: Below shows unemployment rates for year 2017 (not showing 2018-2022)
> df[2:13]
Jan.2017 Feb.2017 Mar.2017 Apr.2017 May.2017 Jun.2017 Jul.2017 Aug.2017 Sep.2017 Oct.2017 Nov.2017 Dec.2017
1 5.5 5.2 5.0 4.8 4.6 4.4 4.3 4.2 4.1 4.0 4.0 4.0
2 6.6 6.6 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.4
3 5.2 5.2 5.1 5.0 5.0 4.9 4.9 4.8 4.9 4.9 4.9 4.9
4 3.8 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.8 3.8 3.8 3.8
# I am using the below to get the average from Jan2017-Dec2017, Jan 2018-Dec 2018 etc.
df$x2017 <- rowMeans(df[ , c(2:13)], na.rm=TRUE)
df$x2018 <- rowMeans(df[ , c(14:25)], na.rm=TRUE)
df$x2019 <- rowMeans(df[ , c(26:37)], na.rm=TRUE)
df$x2021 <- rowMeans(df[ , c(38:49)], na.rm=TRUE)
df$x2022 <- rowMeans(df[ , c(50:52)], na.rm=TRUE)
# output
State x2017 x2018 x2019 x2021 x2022
1 Alabama 8.0 7.2 6.6 6.1 5.9
2 Alaska 7.2 7.0 6.6 6.3 6.5
3 Arizona 8.3 7.7 6.7 6.0 5.6
4 Arkansas 7.2 6.9 5.7 4.7 4.0
我只是在寻找可以减少错误可能性的方法,告诉它要获取哪些列的平均值。
如有需要,补上df
state = c("Alabama", "Alaska", "Arizona")
Jan2017 = c(1:3)
Feb2017 = c(4:6)
Jan2018 = c(7:9)
Feb2018 = c(10:12)
Jan2019 = c(13:15)
Feb2019 = c(16:18)
df3=data.frame(state,Jan2017,Feb2017,Jan2018,Feb2018,Jan2019,Feb2019)
> df3
state Jan2017 Feb2017 Jan2018 Feb2018 Jan2019 Feb2019
1 Alabama 1 4 7 10 13 16
2 Alaska 2 5 8 11 14 17
3 Arizona 3 6 9 12 15 18
这是一个 tidyverse
解决方案,其中包含旋转和汇总:
library(dplyr)
library(tidyr)
df3 %>%
pivot_longer(-state) %>%
mutate(helper = parse_number(name)) %>%
group_by(state, helper) %>%
mutate(mean = mean(value, na.rm=TRUE)) %>%
pivot_wider(names_from = helper,
values_from = mean) %>%
group_by(state) %>%
summarise(across(-c(name, value), mean, na.rm = TRUE), .groups = 'drop')
state `2017` `2018` `2019`
<chr> <dbl> <dbl> <dbl>
1 Alabama 2.5 8.5 14.5
2 Alaska 3.5 9.5 15.5
3 Arizona 4.5 10.5 16.5
cbind(df3[1], sapply(split.default(df3[-1], sub("\D+", "", names(df3)[-1])), rowMeans))
state 2017 2018 2019
1 Alabama 2.5 8.5 14.5
2 Alaska 3.5 9.5 15.5
3 Arizona 4.5 10.5 16.5
这是另一个略有不同的 tidyverse
选项:
library(tidyverse)
df3 %>%
pivot_longer(
cols = -state,
names_to = c(NA, ".value"),
names_pattern = "(.*)(\d{4})"
) %>%
group_by(state) %>%
summarize(across(everything(), mean, na.rm = TRUE))
输出
state `2017` `2018` `2019`
<chr> <dbl> <dbl> <dbl>
1 Alabama 2.5 8.5 14.5
2 Alaska 3.5 9.5 15.5
3 Arizona 4.5 10.5 16.5