如何在 For 循环中索引多个对象/变量

How to index multiple objects / variables within a For loop

我的数据由三列组成:

家庭 ID、产品 ID (H14aq2)、值。

我有大约 7000 行(家庭 ID),可以分为 12 个地区和 160 个产品。 HH ID 可以出现多次,因为它们使用多种产品。我的目标是对每个产品的家庭价值求和,这样我就能得到整个地区的产品价值总和。我知道如何手动实现这一点,但我想使用循环,因为我将对多个数据集执行此操作。

这是我当前的代码。这实际上运行没有错误,显示了 156 次迭代,但是当我查看 total_values_05 对象时,只附加了一个额外的向量,val_i.

for(i in 105:161){
  
  total_val_i <- cons_05 %>% 
    filter(H14aq2 == i) %>% 
    group_by(Districtn05) %>% 
    summarise(val_i = sum(total_val_yr)) %>% 
    ungroup()
  
  total_values_05 <- total_values_05 %>% 
    left_join(total_val_i)
  rm(total_val_i)
  
}

有 161 种产品(使用变量 H14aq2 进行索引,从 101 到 161)。在此循环之前,我创建了对象 total_values_05,出于其他原因,我在其中处理产品 101 到 104。

在每次迭代中,我想过滤单个产品,对包含值的 total_val_yr 变量求和,然后将新向量 val_i 附加到现有对象 total_values_05.最终我想要一个结构如下的对象:

District val_101 val_102 val_103
First row row row
Second row row row

(最多 val_161 和第 12 区)

在我看来,我遗漏了一件让这项工作真正起作用的小事,因为代码运行并且实际上已经附加了一个名为 val_i 的变量 - 我认为索引多个事物存在问题我。

这是我第一次尝试循环!非常感谢任何帮助:)

这是示例数据(仅包含我的问题所需的 4 个变量)

structure(list(Hhid = structure(c("1033000301", "1033000301", 
"1033000301", "1033000301", "1033000301", "1033000301"), label = "Unique hh identifier across panel waves", format.stata = "%-10s"), 
    Districtn05 = structure(c("Kiboga", "Kiboga", "Kiboga", "Kiboga", 
    "Kiboga", "Kiboga"), label = "District name as in 2005/06", format.stata = "%-13s"), 
    H14aq2 = structure(c(150, 135, 140, 136, 112, 103), label = "Consumption item code", format.stata = "%16.0g", labels = c(Matooke = 101, 
    Matooke = 102, Matooke = 103, Matooke = 104, `Sweet potatoes fresh` = 105, 
    `Sweet potatoes dry` = 106, `Cassava fresh` = 107, `Cassava dry/flour` = 108, 
    `Irish potatoes` = 109, Rice = 110, `Maize grains` = 111, 
    `Maize cobs` = 112, `Maize flour` = 113, Bread = 114, Millet = 115, 
    Sorghum = 116, Beef = 117, Pork = 118, `Goat meat` = 119, 
    `Other meat` = 120, Chicken = 121, `Fresh fish` = 122, `Dry/smoked fish` = 123, 
    Eggs = 124, `Fresh milk` = 125, `Infant formula foods` = 126, 
    `Cooking oil` = 127, Ghee = 128, `Margarine,butter` = 129, 
    `Passion fruits` = 130, `Sweet bananas` = 131, Mangoes = 132, 
    Oranges = 133, `Other fruits` = 134, Onions = 135, Tomatoes = 136, 
    Cabbages = 137, Dodo = 138, `Other vegetables` = 139, `Beans fresh` = 140, 
    `Beans dry` = 141, `Ground nuts in shell` = 142, `Ground nuts shelled` = 143, 
    `Ground nuts pounded` = 144, Peas = 145, Simsim = 146, Sugar = 147, 
    Coffee = 148, Tea = 149, Salt = 150, Soda = 151, Beer = 152, 
    `Other alcoholic drinks` = 153, `Other drinks` = 154, Cigarettes = 155, 
    `Other tobbaco` = 156, `Expenditure in restaurants on food` = 157, 
    `Expenditure in restaurants on soda` = 158, `Expenditure in restaurants on beer` = 159, 
    `Other juice` = 160, `Other foods` = 161), class = c("haven_labelled", 
    "vctrs_vctr", "double")), total_val_yr = c(3250, 10400, 156000, 
    10400, 260000, 312000)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame")) ```



您可以按多列分组,然后将汇总结果转换为宽格式,如下所示:

library(tidyverse)

data <- structure(list(
  Hhid = structure(c(
    "1033000301", "1033000301",
    "1033000301", "1033000301", "1033000301", "1033000301"
  ), label = "Unique hh identifier across panel waves", format.stata = "%-10s"),
  Districtn05 = structure(c(
    "Kiboga", "Kiboga", "Kiboga", "Kiboga",
    "Kiboga", "Kiboga"
  ), label = "District name as in 2005/06", format.stata = "%-13s"),
  H14aq2 = structure(c(150, 135, 140, 136, 112, 103), label = "Consumption item code", format.stata = "%16.0g", labels = c(
    Matooke = 101,
    Matooke = 102, Matooke = 103, Matooke = 104, `Sweet potatoes fresh` = 105,
    `Sweet potatoes dry` = 106, `Cassava fresh` = 107, `Cassava dry/flour` = 108,
    `Irish potatoes` = 109, Rice = 110, `Maize grains` = 111,
    `Maize cobs` = 112, `Maize flour` = 113, Bread = 114, Millet = 115,
    Sorghum = 116, Beef = 117, Pork = 118, `Goat meat` = 119,
    `Other meat` = 120, Chicken = 121, `Fresh fish` = 122, `Dry/smoked fish` = 123,
    Eggs = 124, `Fresh milk` = 125, `Infant formula foods` = 126,
    `Cooking oil` = 127, Ghee = 128, `Margarine,butter` = 129,
    `Passion fruits` = 130, `Sweet bananas` = 131, Mangoes = 132,
    Oranges = 133, `Other fruits` = 134, Onions = 135, Tomatoes = 136,
    Cabbages = 137, Dodo = 138, `Other vegetables` = 139, `Beans fresh` = 140,
    `Beans dry` = 141, `Ground nuts in shell` = 142, `Ground nuts shelled` = 143,
    `Ground nuts pounded` = 144, Peas = 145, Simsim = 146, Sugar = 147,
    Coffee = 148, Tea = 149, Salt = 150, Soda = 151, Beer = 152,
    `Other alcoholic drinks` = 153, `Other drinks` = 154, Cigarettes = 155,
    `Other tobbaco` = 156, `Expenditure in restaurants on food` = 157,
    `Expenditure in restaurants on soda` = 158, `Expenditure in restaurants on beer` = 159,
    `Other juice` = 160, `Other foods` = 161
  ), class = c(
    "haven_labelled",
    "vctrs_vctr", "double"
  )), total_val_yr = c(
    3250, 10400, 156000,
    10400, 260000, 312000
  )
), row.names = c(NA, -6L), class = c(
  "tbl_df",
  "tbl", "data.frame"
))


data %>%
  group_by(Districtn05, H14aq2) %>%
  summarise(total_val_yr = sum(total_val_yr)) %>%
  select(total_val_yr, H14aq2) %>%
  pivot_wider(names_from = H14aq2, values_from = total_val_yr, names_prefix = "val_")
#> `summarise()` has grouped output by 'Districtn05'. You can override using the
#> `.groups` argument.
#> Adding missing grouping variables: `Districtn05`
#> # A tibble: 1 × 7
#> # Groups:   Districtn05 [1]
#>   Districtn05 val_103 val_112 val_135 val_136 val_140 val_150
#>   <chr>         <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 Kiboga       312000  260000   10400   10400  156000    3250

reprex package (v2.0.0)

于 2022-05-25 创建