R中分组数据和数据框之间的数学运算
mathematical operations between the grouped data and a dataframe in R
我将对应于上述问题的一组简化的 2 个数据框放在这里:
ss <- structure(list(country = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("a", "b", "c",
"d", "e", "f", "g", "h", "k", "v"), class = "factor"), year = c(1961L,
1962L, 1963L, 1961L, 1962L, 1963L, 1961L, 1962L, 1963L, 1961L,
1962L, 1963L, 1961L, 1962L, 1963L, 1961L, 1962L, 1963L, 1961L,
1962L, 1963L, 1961L, 1962L, 1963L, 1961L, 1962L, 1963L, 1961L,
1962L, 1963L), x = c(19L, 4L, 3L, 23L, 24L, 16L, 28L, 9L, 29L,
20L, 14L, 21L, 30L, 1L, 12L, 17L, 25L, 26L, 13L, 8L, 2L, 7L,
10L, 11L, 6L, 22L, 27L, 5L, 15L, 18L), y = c(23L, 20L, 28L, 7L,
4L, 25L, 5L, 8L, 10L, 13L, 9L, 1L, 21L, 11L, 26L, 16L, 27L, 2L,
29L, 24L, 3L, 15L, 6L, 19L, 14L, 22L, 12L, 18L, 17L, 30L), z = c(22L,
4L, 23L, 16L, 29L, 14L, 11L, 13L, 27L, 26L, 5L, 12L, 2L, 9L,
10L, 25L, 7L, 21L, 6L, 20L, 3L, 30L, 18L, 8L, 1L, 24L, 17L, 15L,
28L, 19L)), class = "data.frame", row.names = c(NA, -30L))
和
zz <- structure(list(country = structure(c(1L, 1L, 1L), .Label = "w", class = "factor"),
year = 1961:1963, x = c(2L, 1L, 3L), y = c(3L, 1L, 2L), z = 1:3), class = "data.frame", row.names = c(NA,
-3L))
数据框 ss
代表来自 10 个国家 3 年的数据。并且,dataframe zz
代表相应年份的世界数据。
是否有任何方法可以应用 ss(for each each group as country)/zz
等条件,从而可以将每个国家/地区的价值提取为与世界数据的比率。我的意思是前两列也应该保留 ss
。
我们能否避免使用 dplyr
和 tidverse
重塑数据,这只会增加更多的编码行。
谢谢。
使用 match
.
cbind(ss[1:2], ss[-(1:2)] / zz[match(ss$year, zz$year), -(1:2)])
# country year x y z
# 1 a 1961 9.5000000 7.666667 22.000000
# 2 b 1962 4.0000000 20.000000 2.000000
# 3 c 1963 1.0000000 14.000000 7.666667
# 4 d 1961 11.5000000 2.333333 16.000000
# 5 e 1962 24.0000000 4.000000 14.500000
# 6 f 1963 5.3333333 12.500000 4.666667
# 7 g 1961 14.0000000 1.666667 11.000000
# 8 h 1962 9.0000000 8.000000 6.500000
# 9 k 1963 9.6666667 5.000000 9.000000
# 10 v 1961 10.0000000 4.333333 26.000000
# 11 a 1962 14.0000000 9.000000 2.500000
# 12 b 1963 7.0000000 0.500000 4.000000
# 13 c 1961 15.0000000 7.000000 2.000000
# 14 d 1962 1.0000000 11.000000 4.500000
# 15 e 1963 4.0000000 13.000000 3.333333
# 16 f 1961 8.5000000 5.333333 25.000000
# 17 g 1962 25.0000000 27.000000 3.500000
# 18 h 1963 8.6666667 1.000000 7.000000
# 19 k 1961 6.5000000 9.666667 6.000000
# 20 v 1962 8.0000000 24.000000 10.000000
# 21 a 1963 0.6666667 1.500000 1.000000
# 22 b 1961 3.5000000 5.000000 30.000000
# 23 c 1962 10.0000000 6.000000 9.000000
# 24 d 1963 3.6666667 9.500000 2.666667
# 25 e 1961 3.0000000 4.666667 1.000000
# 26 f 1962 22.0000000 22.000000 12.000000
# 27 g 1963 9.0000000 6.000000 5.666667
# 28 h 1961 2.5000000 6.000000 15.000000
# 29 k 1962 15.0000000 17.000000 14.000000
# 30 v 1963 6.0000000 15.000000 6.333333
这也可以使用包 data.table
单行完成:
as.data.table(ss)[zz, .(country, year, x = x/i.x, y = y/i.y, z = z/i.z), on = .(year)]
# country year x y z
# 1: a 1961 9.5000000 7.666667 22.000000
# 2: d 1961 11.5000000 2.333333 16.000000
# 3: g 1961 14.0000000 1.666667 11.000000
# 4: v 1961 10.0000000 4.333333 26.000000
# 5: c 1961 15.0000000 7.000000 2.000000
# 6: f 1961 8.5000000 5.333333 25.000000
# 7: k 1961 6.5000000 9.666667 6.000000
# 8: b 1961 3.5000000 5.000000 30.000000
# 9: e 1961 3.0000000 4.666667 1.000000
# 10: h 1961 2.5000000 6.000000 15.000000
# 11: b 1962 4.0000000 20.000000 2.000000
# 12: e 1962 24.0000000 4.000000 14.500000
# 13: h 1962 9.0000000 8.000000 6.500000
# 14: a 1962 14.0000000 9.000000 2.500000
# 15: d 1962 1.0000000 11.000000 4.500000
# 16: g 1962 25.0000000 27.000000 3.500000
# 17: v 1962 8.0000000 24.000000 10.000000
# 18: c 1962 10.0000000 6.000000 9.000000
# 19: f 1962 22.0000000 22.000000 12.000000
# 20: k 1962 15.0000000 17.000000 14.000000
# 21: c 1963 1.0000000 14.000000 7.666667
# 22: f 1963 5.3333333 12.500000 4.666667
# 23: k 1963 9.6666667 5.000000 9.000000
# 24: b 1963 7.0000000 0.500000 4.000000
# 25: e 1963 4.0000000 13.000000 3.333333
# 26: h 1963 8.6666667 1.000000 7.000000
# 27: a 1963 0.6666667 1.500000 1.000000
# 28: d 1963 3.6666667 9.500000 2.666667
# 29: g 1963 9.0000000 6.000000 5.666667
# 30: v 1963 6.0000000 15.000000 6.333333
# country year x y z
我将对应于上述问题的一组简化的 2 个数据框放在这里:
ss <- structure(list(country = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("a", "b", "c",
"d", "e", "f", "g", "h", "k", "v"), class = "factor"), year = c(1961L,
1962L, 1963L, 1961L, 1962L, 1963L, 1961L, 1962L, 1963L, 1961L,
1962L, 1963L, 1961L, 1962L, 1963L, 1961L, 1962L, 1963L, 1961L,
1962L, 1963L, 1961L, 1962L, 1963L, 1961L, 1962L, 1963L, 1961L,
1962L, 1963L), x = c(19L, 4L, 3L, 23L, 24L, 16L, 28L, 9L, 29L,
20L, 14L, 21L, 30L, 1L, 12L, 17L, 25L, 26L, 13L, 8L, 2L, 7L,
10L, 11L, 6L, 22L, 27L, 5L, 15L, 18L), y = c(23L, 20L, 28L, 7L,
4L, 25L, 5L, 8L, 10L, 13L, 9L, 1L, 21L, 11L, 26L, 16L, 27L, 2L,
29L, 24L, 3L, 15L, 6L, 19L, 14L, 22L, 12L, 18L, 17L, 30L), z = c(22L,
4L, 23L, 16L, 29L, 14L, 11L, 13L, 27L, 26L, 5L, 12L, 2L, 9L,
10L, 25L, 7L, 21L, 6L, 20L, 3L, 30L, 18L, 8L, 1L, 24L, 17L, 15L,
28L, 19L)), class = "data.frame", row.names = c(NA, -30L))
和
zz <- structure(list(country = structure(c(1L, 1L, 1L), .Label = "w", class = "factor"),
year = 1961:1963, x = c(2L, 1L, 3L), y = c(3L, 1L, 2L), z = 1:3), class = "data.frame", row.names = c(NA,
-3L))
数据框 ss
代表来自 10 个国家 3 年的数据。并且,dataframe zz
代表相应年份的世界数据。
是否有任何方法可以应用 ss(for each each group as country)/zz
等条件,从而可以将每个国家/地区的价值提取为与世界数据的比率。我的意思是前两列也应该保留 ss
。
我们能否避免使用 dplyr
和 tidverse
重塑数据,这只会增加更多的编码行。
谢谢。
使用 match
.
cbind(ss[1:2], ss[-(1:2)] / zz[match(ss$year, zz$year), -(1:2)])
# country year x y z
# 1 a 1961 9.5000000 7.666667 22.000000
# 2 b 1962 4.0000000 20.000000 2.000000
# 3 c 1963 1.0000000 14.000000 7.666667
# 4 d 1961 11.5000000 2.333333 16.000000
# 5 e 1962 24.0000000 4.000000 14.500000
# 6 f 1963 5.3333333 12.500000 4.666667
# 7 g 1961 14.0000000 1.666667 11.000000
# 8 h 1962 9.0000000 8.000000 6.500000
# 9 k 1963 9.6666667 5.000000 9.000000
# 10 v 1961 10.0000000 4.333333 26.000000
# 11 a 1962 14.0000000 9.000000 2.500000
# 12 b 1963 7.0000000 0.500000 4.000000
# 13 c 1961 15.0000000 7.000000 2.000000
# 14 d 1962 1.0000000 11.000000 4.500000
# 15 e 1963 4.0000000 13.000000 3.333333
# 16 f 1961 8.5000000 5.333333 25.000000
# 17 g 1962 25.0000000 27.000000 3.500000
# 18 h 1963 8.6666667 1.000000 7.000000
# 19 k 1961 6.5000000 9.666667 6.000000
# 20 v 1962 8.0000000 24.000000 10.000000
# 21 a 1963 0.6666667 1.500000 1.000000
# 22 b 1961 3.5000000 5.000000 30.000000
# 23 c 1962 10.0000000 6.000000 9.000000
# 24 d 1963 3.6666667 9.500000 2.666667
# 25 e 1961 3.0000000 4.666667 1.000000
# 26 f 1962 22.0000000 22.000000 12.000000
# 27 g 1963 9.0000000 6.000000 5.666667
# 28 h 1961 2.5000000 6.000000 15.000000
# 29 k 1962 15.0000000 17.000000 14.000000
# 30 v 1963 6.0000000 15.000000 6.333333
这也可以使用包 data.table
单行完成:
as.data.table(ss)[zz, .(country, year, x = x/i.x, y = y/i.y, z = z/i.z), on = .(year)]
# country year x y z
# 1: a 1961 9.5000000 7.666667 22.000000
# 2: d 1961 11.5000000 2.333333 16.000000
# 3: g 1961 14.0000000 1.666667 11.000000
# 4: v 1961 10.0000000 4.333333 26.000000
# 5: c 1961 15.0000000 7.000000 2.000000
# 6: f 1961 8.5000000 5.333333 25.000000
# 7: k 1961 6.5000000 9.666667 6.000000
# 8: b 1961 3.5000000 5.000000 30.000000
# 9: e 1961 3.0000000 4.666667 1.000000
# 10: h 1961 2.5000000 6.000000 15.000000
# 11: b 1962 4.0000000 20.000000 2.000000
# 12: e 1962 24.0000000 4.000000 14.500000
# 13: h 1962 9.0000000 8.000000 6.500000
# 14: a 1962 14.0000000 9.000000 2.500000
# 15: d 1962 1.0000000 11.000000 4.500000
# 16: g 1962 25.0000000 27.000000 3.500000
# 17: v 1962 8.0000000 24.000000 10.000000
# 18: c 1962 10.0000000 6.000000 9.000000
# 19: f 1962 22.0000000 22.000000 12.000000
# 20: k 1962 15.0000000 17.000000 14.000000
# 21: c 1963 1.0000000 14.000000 7.666667
# 22: f 1963 5.3333333 12.500000 4.666667
# 23: k 1963 9.6666667 5.000000 9.000000
# 24: b 1963 7.0000000 0.500000 4.000000
# 25: e 1963 4.0000000 13.000000 3.333333
# 26: h 1963 8.6666667 1.000000 7.000000
# 27: a 1963 0.6666667 1.500000 1.000000
# 28: d 1963 3.6666667 9.500000 2.666667
# 29: g 1963 9.0000000 6.000000 5.666667
# 30: v 1963 6.0000000 15.000000 6.333333
# country year x y z