创建循环以计算条件下的百分比并填写结果
Create loop to calculate percentage under conditions and fill in results
我试图找到一些信息,但没有真正找到我要找的东西。
这是我的数据框full.data
(摘录)
country year sector emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 0
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 0
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 0
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 0
Austria 2015 Total 6.670398e+07 AT 0
Austria 2015 Regulated 3.800309e+07 AT 0
Austria 2015 Unregulated 2.870090e+07 AT 0
Austria 2015 ETS 2.949206e+07 AT 0
Austria 2016 Total 6.740209e+07 AT 0
Austria 2016 Regulated 3.765177e+07 AT 0
Austria 2016 Unregulated 2.975031e+07 AT 0
Austria 2016 ETS 2.900012e+07 AT 0
Austria 2017 ETS 3.055523e+07 AT 0
Belgium 1990 Total 1.204844e+08 BE 0
Belgium 1990 Regulated 7.861411e+07 BE 0
Belgium 1990 Unregulated 4.187029e+07 BE 0
Belgium 1991 Total 1.235447e+08 BE 0
Belgium 1991 Regulated 7.981152e+07 BE 0
Belgium 1991 Unregulated 4.373319e+07 BE 0
Belgium 1992 Total 1.226578e+08 BE 0
Belgium 1992 Regulated 7.828396e+07 BE 0
Belgium 1992 Unregulated 4.437385e+07 BE 0
Belgium 1993 Total 1.215573e+08 BE 0
Belgium 1993 Regulated 7.675229e+07 BE 0
Belgium 1993 Unregulated 4.480499e+07 BE 0
Belgium 1994 Total 1.249382e+08 BE 0
Belgium 1994 Regulated 8.064799e+07 BE 0
Belgium 1994 Unregulated 4.429020e+07 BE 0
....
我正在尝试填写 full.data$PercentageDifference
emissions
的百分比,其中 sector=ETS
和 sector=Regulated
(排放部门=ETS 是 xx.y%行业=监管)。此百分比值应填入 PercentageDifference
,其中 sector=ETS
。这应该发生在每一年和国家。我假设我需要一个循环。我读过 dplyr 对此很有用,但并没有真正弄清楚该怎么做。但是,如果有比 dplyr 更好的方法,那对我来说很好。
结果会像这样
country year sector emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 72.6%
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 71.9%
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 77.3%
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 77.3%
到目前为止我还没有发布我所做的事情,因为我没有做很多事情....
感谢您的帮助。
北海
基于tydiverse/dplyr的解决方案来了。
文件 stack.txt 包含上面示例中的粘贴文本。
library(tidyverse)
full_data <- read.table("stack.txt", quote="\"", comment.char="")
names(full_data) <- c("country", "year", "sector", "emission", "iso", "perc")
full_data <- full_data %>%
select(-perc)
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated) %>%
select(country, year, percentage) %>%
right_join(full_data) %>%
select(country, year, sector, emission, iso, percentage) %>%
mutate(percentage = ifelse(sector == "ETS", percentage, 0))
结果:
country year sector emission iso percentage
1 Austria 2011 Total 70115670 AT 0.0000000
2 Austria 2011 Regulated 42148360 AT 0.0000000
3 Austria 2011 Unregulated 27967320 AT 0.0000000
4 Austria 2011 ETS 30599420 AT 0.7259931
5 Austria 2012 Total 67661400 AT 0.0000000
6 Austria 2012 Regulated 39494450 AT 0.0000000
7 Austria 2012 Unregulated 28166950 AT 0.0000000
8 Austria 2012 ETS 28387060 AT 0.7187607
9 Austria 2013 Total 68001230 AT 0.0000000
10 Austria 2013 Regulated 38573960 AT 0.0000000
11 Austria 2013 Unregulated 29427270 AT 0.0000000
如果你想要更多的解释,它是如何工作的,我建议打破管道并查看中间结果,即
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated)
country year ETS Regulated Total Unregulated percentage
1 Austria 2011 30599420 42148360 70115670 27967320 0.7259931
2 Austria 2012 28387060 39494450 67661400 28166950 0.7187607
3 Austria 2013 29804410 38573960 68001230 29427270 0.7726562
4 Austria 2014 28055970 36311070 64253330 27942250 0.7726561
5 Austria 2015 29492060 38003090 66703980 28700900 0.7760437
6 Austria 2016 29000120 37651770 67402090 29750310 0.7702193
问候
帕维尔
怎么样:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
)
前 10 行:
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <chr>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6%
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9%
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
重要的是要知道百分比列将是 character
类型,然后,正如您指定的那样,您希望看到 %
登录。
如果你想保留它numeric
,你可以只删除mutate
中的最后一步,即你可以这样做:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
)
输出(前 10 行):
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <dbl>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
如果您想避免加载 zoo
包,您也可以在单独的步骤中使用 tidyverse
中的 fill
,但它 太多了 较慢。
我试图找到一些信息,但没有真正找到我要找的东西。
这是我的数据框full.data
(摘录)
country year sector emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 0
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 0
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 0
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 0
Austria 2015 Total 6.670398e+07 AT 0
Austria 2015 Regulated 3.800309e+07 AT 0
Austria 2015 Unregulated 2.870090e+07 AT 0
Austria 2015 ETS 2.949206e+07 AT 0
Austria 2016 Total 6.740209e+07 AT 0
Austria 2016 Regulated 3.765177e+07 AT 0
Austria 2016 Unregulated 2.975031e+07 AT 0
Austria 2016 ETS 2.900012e+07 AT 0
Austria 2017 ETS 3.055523e+07 AT 0
Belgium 1990 Total 1.204844e+08 BE 0
Belgium 1990 Regulated 7.861411e+07 BE 0
Belgium 1990 Unregulated 4.187029e+07 BE 0
Belgium 1991 Total 1.235447e+08 BE 0
Belgium 1991 Regulated 7.981152e+07 BE 0
Belgium 1991 Unregulated 4.373319e+07 BE 0
Belgium 1992 Total 1.226578e+08 BE 0
Belgium 1992 Regulated 7.828396e+07 BE 0
Belgium 1992 Unregulated 4.437385e+07 BE 0
Belgium 1993 Total 1.215573e+08 BE 0
Belgium 1993 Regulated 7.675229e+07 BE 0
Belgium 1993 Unregulated 4.480499e+07 BE 0
Belgium 1994 Total 1.249382e+08 BE 0
Belgium 1994 Regulated 8.064799e+07 BE 0
Belgium 1994 Unregulated 4.429020e+07 BE 0
....
我正在尝试填写 full.data$PercentageDifference
emissions
的百分比,其中 sector=ETS
和 sector=Regulated
(排放部门=ETS 是 xx.y%行业=监管)。此百分比值应填入 PercentageDifference
,其中 sector=ETS
。这应该发生在每一年和国家。我假设我需要一个循环。我读过 dplyr 对此很有用,但并没有真正弄清楚该怎么做。但是,如果有比 dplyr 更好的方法,那对我来说很好。
结果会像这样
country year sector emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 72.6%
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 71.9%
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 77.3%
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 77.3%
到目前为止我还没有发布我所做的事情,因为我没有做很多事情....
感谢您的帮助。
北海
基于tydiverse/dplyr的解决方案来了。
文件 stack.txt 包含上面示例中的粘贴文本。
library(tidyverse)
full_data <- read.table("stack.txt", quote="\"", comment.char="")
names(full_data) <- c("country", "year", "sector", "emission", "iso", "perc")
full_data <- full_data %>%
select(-perc)
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated) %>%
select(country, year, percentage) %>%
right_join(full_data) %>%
select(country, year, sector, emission, iso, percentage) %>%
mutate(percentage = ifelse(sector == "ETS", percentage, 0))
结果:
country year sector emission iso percentage
1 Austria 2011 Total 70115670 AT 0.0000000
2 Austria 2011 Regulated 42148360 AT 0.0000000
3 Austria 2011 Unregulated 27967320 AT 0.0000000
4 Austria 2011 ETS 30599420 AT 0.7259931
5 Austria 2012 Total 67661400 AT 0.0000000
6 Austria 2012 Regulated 39494450 AT 0.0000000
7 Austria 2012 Unregulated 28166950 AT 0.0000000
8 Austria 2012 ETS 28387060 AT 0.7187607
9 Austria 2013 Total 68001230 AT 0.0000000
10 Austria 2013 Regulated 38573960 AT 0.0000000
11 Austria 2013 Unregulated 29427270 AT 0.0000000
如果你想要更多的解释,它是如何工作的,我建议打破管道并查看中间结果,即
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated)
country year ETS Regulated Total Unregulated percentage
1 Austria 2011 30599420 42148360 70115670 27967320 0.7259931
2 Austria 2012 28387060 39494450 67661400 28166950 0.7187607
3 Austria 2013 29804410 38573960 68001230 29427270 0.7726562
4 Austria 2014 28055970 36311070 64253330 27942250 0.7726561
5 Austria 2015 29492060 38003090 66703980 28700900 0.7760437
6 Austria 2016 29000120 37651770 67402090 29750310 0.7702193
问候 帕维尔
怎么样:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
)
前 10 行:
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <chr>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6%
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9%
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
重要的是要知道百分比列将是 character
类型,然后,正如您指定的那样,您希望看到 %
登录。
如果你想保留它numeric
,你可以只删除mutate
中的最后一步,即你可以这样做:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
)
输出(前 10 行):
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <dbl>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
如果您想避免加载 zoo
包,您也可以在单独的步骤中使用 tidyverse
中的 fill
,但它 太多了 较慢。