在汇总表中包括 NA
Including NAs in summary tables
对于示例数据框:
migration <- structure(list(area.old = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L,
3L, NA, NA, NA), .Label = c("leeds", "london", "plymouth"), class = "factor"),
area.new = structure(c(7L, 13L, 3L, 2L, 4L, 7L, 6L, 7L, 6L,
13L, 5L, 8L, 7L, 11L, 12L, 9L, 1L, 10L, 11L, NA, NA, NA,
NA, 7L, 6L, 6L), .Label = c("bath", "bristol", "cambridge",
"glasgow", "harrogate", "leeds", "london", "manchester",
"newcastle", "oxford", "plymouth", "poole", "york"), class = "factor"),
persons = c(6L, 3L, 2L, 5L, 6L, 7L, 8L, 4L, 5L, 6L, 3L, 4L,
1L, 1L, 2L, 3L, 4L, 9L, 4L, 5L, 7L, 9L, 10L, 15L, 4L, 7L)), .Names = c("area.old",
"area.new", "persons"), class = c("data.table", "data.frame"), row.names = c(NA,
-26L), .internal.selfref = <pointer: 0x0000000000220788>)
我希望使用代码将数据汇总到几个数据帧中:
moved.from <- migration[as.character(area.old)!=as.character(area.new),
.(persons = sum(persons)),
by=.(moved.from = area.old)]
moved.to <- migration[as.character(area.old)!=as.character(area.new),
.(persons = sum(persons)),
by=.(moved.to = area.new)]
这会生成两个摘要 table,首先详细说明从 'area.old' 中的区域迁移的总人数。第二个 table 列出了人们移动到的目的地(在 'area.new' 中)。此处友情推荐此代码 ()。
当我在自己的数据上尝试时出现了一个问题,因为我没有告诉 R 如何处理 'area.old' 或 'area.new' 列中的 NA。我如何修改此代码以添加所有 NA(即将它们包含在 moved.from 和 moved.to 数据框底部的一行中,添加 NA 中的总人数)?
如有任何帮助,我们将不胜感激。
只需在每个过滤器
中添加 | is.na
作为附加条件
migration[as.character(area.old) !=
as.character(area.new) |
is.na(area.old),
.(persons = sum(persons)),
by = .(moved.from = area.old)]
# moved.from persons
# 1: london 24
# 2: leeds 17
# 3: plymouth 19
# 4: NA 26
和
migration[as.character(area.old) !=
as.character(area.new) |
is.na(area.new),
.(persons = sum(persons)),
by = .(moved.to = area.new)]
# moved.to persons
# 1: york 9
# 2: cambridge 2
# 3: bristol 5
# 4: glasgow 6
# 5: leeds 8
# 6: london 5
# 7: harrogate 3
# 8: manchester 4
# 9: poole 2
# 10: newcastle 3
# 11: bath 4
# 12: oxford 9
# 13: NA 31
作为旁注,我建议将您的两列转换为字符 class 并避免在每个操作中调用 as.character
。以下应该做
migration[, names(migration)[-3L] := lapply(.SD, as.character), .SDcols = -"persons"]
现在您可以比较 area.old
和 area.new
而无需调用 as.character
对于示例数据框:
migration <- structure(list(area.old = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L,
3L, NA, NA, NA), .Label = c("leeds", "london", "plymouth"), class = "factor"),
area.new = structure(c(7L, 13L, 3L, 2L, 4L, 7L, 6L, 7L, 6L,
13L, 5L, 8L, 7L, 11L, 12L, 9L, 1L, 10L, 11L, NA, NA, NA,
NA, 7L, 6L, 6L), .Label = c("bath", "bristol", "cambridge",
"glasgow", "harrogate", "leeds", "london", "manchester",
"newcastle", "oxford", "plymouth", "poole", "york"), class = "factor"),
persons = c(6L, 3L, 2L, 5L, 6L, 7L, 8L, 4L, 5L, 6L, 3L, 4L,
1L, 1L, 2L, 3L, 4L, 9L, 4L, 5L, 7L, 9L, 10L, 15L, 4L, 7L)), .Names = c("area.old",
"area.new", "persons"), class = c("data.table", "data.frame"), row.names = c(NA,
-26L), .internal.selfref = <pointer: 0x0000000000220788>)
我希望使用代码将数据汇总到几个数据帧中:
moved.from <- migration[as.character(area.old)!=as.character(area.new),
.(persons = sum(persons)),
by=.(moved.from = area.old)]
moved.to <- migration[as.character(area.old)!=as.character(area.new),
.(persons = sum(persons)),
by=.(moved.to = area.new)]
这会生成两个摘要 table,首先详细说明从 'area.old' 中的区域迁移的总人数。第二个 table 列出了人们移动到的目的地(在 'area.new' 中)。此处友情推荐此代码 (
当我在自己的数据上尝试时出现了一个问题,因为我没有告诉 R 如何处理 'area.old' 或 'area.new' 列中的 NA。我如何修改此代码以添加所有 NA(即将它们包含在 moved.from 和 moved.to 数据框底部的一行中,添加 NA 中的总人数)?
如有任何帮助,我们将不胜感激。
只需在每个过滤器
中添加| is.na
作为附加条件
migration[as.character(area.old) !=
as.character(area.new) |
is.na(area.old),
.(persons = sum(persons)),
by = .(moved.from = area.old)]
# moved.from persons
# 1: london 24
# 2: leeds 17
# 3: plymouth 19
# 4: NA 26
和
migration[as.character(area.old) !=
as.character(area.new) |
is.na(area.new),
.(persons = sum(persons)),
by = .(moved.to = area.new)]
# moved.to persons
# 1: york 9
# 2: cambridge 2
# 3: bristol 5
# 4: glasgow 6
# 5: leeds 8
# 6: london 5
# 7: harrogate 3
# 8: manchester 4
# 9: poole 2
# 10: newcastle 3
# 11: bath 4
# 12: oxford 9
# 13: NA 31
作为旁注,我建议将您的两列转换为字符 class 并避免在每个操作中调用 as.character
。以下应该做
migration[, names(migration)[-3L] := lapply(.SD, as.character), .SDcols = -"persons"]
现在您可以比较 area.old
和 area.new
而无需调用 as.character