创建循环以计算条件下的百分比并填写结果

Create loop to calculate percentage under conditions and fill in results

我试图找到一些信息,但没有真正找到我要找的东西。

这是我的数据框full.data(摘录)

country year      sector    emissions iso2 PercentageDifference
....
Austria 2011       Total 7.011567e+07   AT                    0
Austria 2011   Regulated 4.214836e+07   AT                    0
Austria 2011 Unregulated 2.796732e+07   AT                    0
Austria 2011         ETS 3.059942e+07   AT                    0
Austria 2012       Total 6.766140e+07   AT                    0
Austria 2012   Regulated 3.949445e+07   AT                    0
Austria 2012 Unregulated 2.816695e+07   AT                    0
Austria 2012         ETS 2.838706e+07   AT                    0
Austria 2013       Total 6.800123e+07   AT                    0
Austria 2013   Regulated 3.857396e+07   AT                    0
Austria 2013 Unregulated 2.942727e+07   AT                    0
Austria 2013         ETS 2.980441e+07   AT                    0
Austria 2014       Total 6.425333e+07   AT                    0
Austria 2014   Regulated 3.631107e+07   AT                    0
Austria 2014 Unregulated 2.794225e+07   AT                    0
Austria 2014         ETS 2.805597e+07   AT                    0
Austria 2015       Total 6.670398e+07   AT                    0
Austria 2015   Regulated 3.800309e+07   AT                    0
Austria 2015 Unregulated 2.870090e+07   AT                    0
Austria 2015         ETS 2.949206e+07   AT                    0
Austria 2016       Total 6.740209e+07   AT                    0
Austria 2016   Regulated 3.765177e+07   AT                    0
Austria 2016 Unregulated 2.975031e+07   AT                    0
Austria 2016         ETS 2.900012e+07   AT                    0
Austria 2017         ETS 3.055523e+07   AT                    0
Belgium 1990       Total 1.204844e+08   BE                    0
Belgium 1990   Regulated 7.861411e+07   BE                    0
Belgium 1990 Unregulated 4.187029e+07   BE                    0
Belgium 1991       Total 1.235447e+08   BE                    0
Belgium 1991   Regulated 7.981152e+07   BE                    0
Belgium 1991 Unregulated 4.373319e+07   BE                    0
Belgium 1992       Total 1.226578e+08   BE                    0
Belgium 1992   Regulated 7.828396e+07   BE                    0
Belgium 1992 Unregulated 4.437385e+07   BE                    0
Belgium 1993       Total 1.215573e+08   BE                    0
Belgium 1993   Regulated 7.675229e+07   BE                    0
Belgium 1993 Unregulated 4.480499e+07   BE                    0
Belgium 1994       Total 1.249382e+08   BE                    0
Belgium 1994   Regulated 8.064799e+07   BE                    0
Belgium 1994 Unregulated 4.429020e+07   BE                    0
....

我正在尝试填写 full.data$PercentageDifference emissions 的百分比,其中 sector=ETSsector=Regulated(排放部门=ETS 是 xx.y%行业=监管)。此百分比值应填入 PercentageDifference,其中 sector=ETS。这应该发生在每一年和国家。我假设我需要一个循环。我读过 dplyr 对此很有用,但并没有真正弄清楚该怎么做。但是,如果有比 dplyr 更好的方法,那对我来说很好。

结果会像这样

country year      sector    emissions iso2 PercentageDifference
....
Austria 2011       Total 7.011567e+07   AT                    0
Austria 2011   Regulated 4.214836e+07   AT                    0
Austria 2011 Unregulated 2.796732e+07   AT                    0
Austria 2011         ETS 3.059942e+07   AT                72.6%
Austria 2012       Total 6.766140e+07   AT                    0
Austria 2012   Regulated 3.949445e+07   AT                    0
Austria 2012 Unregulated 2.816695e+07   AT                    0
Austria 2012         ETS 2.838706e+07   AT                71.9%
Austria 2013       Total 6.800123e+07   AT                    0
Austria 2013   Regulated 3.857396e+07   AT                    0
Austria 2013 Unregulated 2.942727e+07   AT                    0
Austria 2013         ETS 2.980441e+07   AT                77.3%
Austria 2014       Total 6.425333e+07   AT                    0
Austria 2014   Regulated 3.631107e+07   AT                    0
Austria 2014 Unregulated 2.794225e+07   AT                    0
Austria 2014         ETS 2.805597e+07   AT                77.3%

到目前为止我还没有发布我所做的事情,因为我没有做很多事情....

感谢您的帮助。

北海

基于tydiverse/dplyr的解决方案来了。

文件 stack.txt 包含上面示例中的粘贴文本。

library(tidyverse)
full_data <- read.table("stack.txt", quote="\"", comment.char="") 
names(full_data) <-  c("country", "year", "sector", "emission", "iso", "perc")

full_data <- full_data %>% 
  select(-perc)
full_data %>% 
  select(-iso) %>% 
  spread(sector, emission) %>% 
  mutate(percentage = ETS/Regulated) %>% 
  select(country, year, percentage) %>%
  right_join(full_data) %>%
  select(country, year, sector, emission, iso, percentage) %>% 
  mutate(percentage = ifelse(sector == "ETS", percentage, 0))

结果:

   country year      sector  emission iso percentage
1  Austria 2011       Total  70115670  AT  0.0000000
2  Austria 2011   Regulated  42148360  AT  0.0000000
3  Austria 2011 Unregulated  27967320  AT  0.0000000
4  Austria 2011         ETS  30599420  AT  0.7259931
5  Austria 2012       Total  67661400  AT  0.0000000
6  Austria 2012   Regulated  39494450  AT  0.0000000
7  Austria 2012 Unregulated  28166950  AT  0.0000000
8  Austria 2012         ETS  28387060  AT  0.7187607
9  Austria 2013       Total  68001230  AT  0.0000000
10 Austria 2013   Regulated  38573960  AT  0.0000000
11 Austria 2013 Unregulated  29427270  AT  0.0000000

如果你想要更多的解释,它是如何工作的,我建议打破管道并查看中间结果,即

full_data %>% 
  select(-iso) %>% 
  spread(sector, emission) %>%
  mutate(percentage = ETS/Regulated)

   country year      ETS Regulated     Total Unregulated percentage
1  Austria 2011 30599420  42148360  70115670    27967320  0.7259931
2  Austria 2012 28387060  39494450  67661400    28166950  0.7187607
3  Austria 2013 29804410  38573960  68001230    29427270  0.7726562
4  Austria 2014 28055970  36311070  64253330    27942250  0.7726561
5  Austria 2015 29492060  38003090  66703980    28700900  0.7760437
6  Austria 2016 29000120  37651770  67402090    29750310  0.7702193

问候 帕维尔

怎么样:

library(tidyverse)
library(zoo)

df %>%
  group_by(country, year) %>%
  mutate(
    PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
    PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
    PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
    PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
  )

前 10 行:

  country  year sector      emissions iso2  PercentageDifference
   <chr>   <int> <chr>           <dbl> <chr> <chr>               
 1 Austria  2011 Total        70115670 AT    NA                  
 2 Austria  2011 Regulated    42148360 AT    NA                  
 3 Austria  2011 Unregulated  27967320 AT    NA                  
 4 Austria  2011 ETS          30599420 AT    72.6%               
 5 Austria  2012 Total        67661400 AT    NA                  
 6 Austria  2012 Regulated    39494450 AT    NA                  
 7 Austria  2012 Unregulated  28166950 AT    NA                  
 8 Austria  2012 ETS          28387060 AT    71.9%               
 9 Austria  2013 Total        68001230 AT    NA                  
10 Austria  2013 Regulated    38573960 AT    NA 

重要的是要知道百分比列将是 character 类型,然后,正如您指定的那样,您希望看到 % 登录。

如果你想保留它numeric,你可以只删除mutate中的最后一步,即你可以这样做:

library(tidyverse)
library(zoo)

df %>%
  group_by(country, year) %>%
  mutate(
    PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
    PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
    PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
  )

输出(前 10 行):

   country  year sector      emissions iso2  PercentageDifference
   <chr>   <int> <chr>           <dbl> <chr>                <dbl>
 1 Austria  2011 Total        70115670 AT                    NA  
 2 Austria  2011 Regulated    42148360 AT                    NA  
 3 Austria  2011 Unregulated  27967320 AT                    NA  
 4 Austria  2011 ETS          30599420 AT                    72.6
 5 Austria  2012 Total        67661400 AT                    NA  
 6 Austria  2012 Regulated    39494450 AT                    NA  
 7 Austria  2012 Unregulated  28166950 AT                    NA  
 8 Austria  2012 ETS          28387060 AT                    71.9
 9 Austria  2013 Total        68001230 AT                    NA  
10 Austria  2013 Regulated    38573960 AT                    NA  

如果您想避免加载 zoo 包,您也可以在单独的步骤中使用 tidyverse 中的 fill,但它 太多了 较慢。