基于两列值的交叉制表
Crosstabulation based on values in two columns
我有交叉制表问题。我的数据有 9600 行,这是一个简化的示例,其中我删除了几个参数:
Year Sex Area
1938 F 2
1938 M 2
1939 M 2
1939 M 5
1955 F 4
1957 F 4
1955 M 4
1943 M 6
1988 F 1
1988 M 2
1987 M 2
1955 M 3
1984 M 4
1984 F 5
1966 M 6
1955 F 8
1984 F 9
1955 M 3
1981 M 6
1938 F 6
我正在寻找像这样交叉制表我的数据的 table,我需要的是地区〜年份的案例统计,而不是参数值的总和。
Year Area=1 Area=2 Area=3 Area=3 Area=5 Area=6 Area=7 Area=8 Area=9
1938 7 0 1 9 2 8 1 14 4
1939 4 4 12 66 3 42 87 24 54
1940 9 33 1 1 6 22 1 5 15
1941 12 2 8 77 11 23 4 14 6
...
我知道有几种方法可以做到这一点,它们可能很简单,但我找不到正确的步骤。
使用dcast函数
library(reshape2)
dcast(df, Year~Area, value.var = "Area", fun.aggregate = length)
结果:
Year 1 2 3 4 5 6 8 9
1 1938 0 2 0 0 0 1 0 0
2 1939 0 1 0 0 1 0 0 0
3 1943 0 0 0 0 0 1 0 0
4 1955 0 0 2 2 0 0 1 0
5 1957 0 0 0 1 0 0 0 0
6 1966 0 0 0 0 0 1 0 0
7 1981 0 0 0 0 0 1 0 0
8 1984 0 0 0 1 1 0 0 1
9 1987 0 1 0 0 0 0 0 0
10 1988 1 1 0 0 0 0 0 0
您可以进一步重命名列。
ct <- dcast(df, Year~Area, value.var = "Area", fun.aggregate = length)
colnames(ct) <- c(colnames(ct[1]), paste0("Area=", colnames(ct[-1])))
ct
结果:
Year Area=1 Area=2 Area=3 Area=4 Area=5 Area=6 Area=8 Area=9
1 1938 0 2 0 0 0 1 0 0
2 1939 0 1 0 0 1 0 0 0
3 1943 0 0 0 0 0 1 0 0
4 1955 0 0 2 2 0 0 1 0
5 1957 0 0 0 1 0 0 0 0
6 1966 0 0 0 0 0 1 0 0
7 1981 0 0 0 0 0 1 0 0
8 1984 0 0 0 1 1 0 0 1
9 1987 0 1 0 0 0 0 0 0
10 1988 1 1 0 0 0 0 0 0
您可以使用 table
:
table(df$Year, df$Area)
# 1 2 3 4 5 6 8 9
# 1938 0 2 0 0 0 1 0 0
# 1939 0 1 0 0 1 0 0 0
# 1943 0 0 0 0 0 1 0 0
# 1955 0 0 2 2 0 0 1 0
# 1957 0 0 0 1 0 0 0 0
# 1966 0 0 0 0 0 1 0 0
# 1981 0 0 0 0 0 1 0 0
# 1984 0 0 0 1 1 0 0 1
# 1987 0 1 0 0 0 0 0 0
# 1988 1 1 0 0 0 0 0 0
或在tidyverse
library(dplyr)
library(tidyr)
df %>%
count(Year, Area) %>%
pivot_wider(names_from = Area, values_from = n,
values_fill = list(n = 0), names_prefix = 'Area')
数据
df <- structure(list(Year = c(1938L, 1938L, 1939L, 1939L, 1955L, 1957L,
1955L, 1943L, 1988L, 1988L, 1987L, 1955L, 1984L, 1984L, 1966L,
1955L, 1984L, 1955L, 1981L, 1938L), Sex = c("F", "M", "M", "M",
"F", "F", "M", "M", "F", "M", "M", "M", "M", "F", "M", "F", "F",
"M", "M", "F"), Area = c(2L, 2L, 2L, 5L, 4L, 4L, 4L, 6L, 1L,
2L, 2L, 3L, 4L, 5L, 6L, 8L, 9L, 3L, 6L, 6L)),
class = "data.frame", row.names = c(NA, -20L))
我有交叉制表问题。我的数据有 9600 行,这是一个简化的示例,其中我删除了几个参数:
Year Sex Area
1938 F 2
1938 M 2
1939 M 2
1939 M 5
1955 F 4
1957 F 4
1955 M 4
1943 M 6
1988 F 1
1988 M 2
1987 M 2
1955 M 3
1984 M 4
1984 F 5
1966 M 6
1955 F 8
1984 F 9
1955 M 3
1981 M 6
1938 F 6
我正在寻找像这样交叉制表我的数据的 table,我需要的是地区〜年份的案例统计,而不是参数值的总和。
Year Area=1 Area=2 Area=3 Area=3 Area=5 Area=6 Area=7 Area=8 Area=9
1938 7 0 1 9 2 8 1 14 4
1939 4 4 12 66 3 42 87 24 54
1940 9 33 1 1 6 22 1 5 15
1941 12 2 8 77 11 23 4 14 6
...
我知道有几种方法可以做到这一点,它们可能很简单,但我找不到正确的步骤。
使用dcast函数
library(reshape2)
dcast(df, Year~Area, value.var = "Area", fun.aggregate = length)
结果:
Year 1 2 3 4 5 6 8 9
1 1938 0 2 0 0 0 1 0 0
2 1939 0 1 0 0 1 0 0 0
3 1943 0 0 0 0 0 1 0 0
4 1955 0 0 2 2 0 0 1 0
5 1957 0 0 0 1 0 0 0 0
6 1966 0 0 0 0 0 1 0 0
7 1981 0 0 0 0 0 1 0 0
8 1984 0 0 0 1 1 0 0 1
9 1987 0 1 0 0 0 0 0 0
10 1988 1 1 0 0 0 0 0 0
您可以进一步重命名列。
ct <- dcast(df, Year~Area, value.var = "Area", fun.aggregate = length)
colnames(ct) <- c(colnames(ct[1]), paste0("Area=", colnames(ct[-1])))
ct
结果:
Year Area=1 Area=2 Area=3 Area=4 Area=5 Area=6 Area=8 Area=9
1 1938 0 2 0 0 0 1 0 0
2 1939 0 1 0 0 1 0 0 0
3 1943 0 0 0 0 0 1 0 0
4 1955 0 0 2 2 0 0 1 0
5 1957 0 0 0 1 0 0 0 0
6 1966 0 0 0 0 0 1 0 0
7 1981 0 0 0 0 0 1 0 0
8 1984 0 0 0 1 1 0 0 1
9 1987 0 1 0 0 0 0 0 0
10 1988 1 1 0 0 0 0 0 0
您可以使用 table
:
table(df$Year, df$Area)
# 1 2 3 4 5 6 8 9
# 1938 0 2 0 0 0 1 0 0
# 1939 0 1 0 0 1 0 0 0
# 1943 0 0 0 0 0 1 0 0
# 1955 0 0 2 2 0 0 1 0
# 1957 0 0 0 1 0 0 0 0
# 1966 0 0 0 0 0 1 0 0
# 1981 0 0 0 0 0 1 0 0
# 1984 0 0 0 1 1 0 0 1
# 1987 0 1 0 0 0 0 0 0
# 1988 1 1 0 0 0 0 0 0
或在tidyverse
library(dplyr)
library(tidyr)
df %>%
count(Year, Area) %>%
pivot_wider(names_from = Area, values_from = n,
values_fill = list(n = 0), names_prefix = 'Area')
数据
df <- structure(list(Year = c(1938L, 1938L, 1939L, 1939L, 1955L, 1957L,
1955L, 1943L, 1988L, 1988L, 1987L, 1955L, 1984L, 1984L, 1966L,
1955L, 1984L, 1955L, 1981L, 1938L), Sex = c("F", "M", "M", "M",
"F", "F", "M", "M", "F", "M", "M", "M", "M", "F", "M", "F", "F",
"M", "M", "F"), Area = c(2L, 2L, 2L, 5L, 4L, 4L, 4L, 6L, 1L,
2L, 2L, 3L, 4L, 5L, 6L, 8L, 9L, 3L, 6L, 6L)),
class = "data.frame", row.names = c(NA, -20L))