测试变量向量并对 table 求和,在 R 中创建新列
Testing over a vector of variables and summing over a table, creating new columns in R
我有一个 table 这样的:
df <- read.table(text =
" Day city gender week
'day1' 'city1' 'M' 'one'
'day2' 'city2' 'M' 'two'
'day1' 'city3' 'F' 'two'
'day2' 'city4' 'F' 'two'",
header = TRUE, stringsAsFactors = FALSE)
我正在计算这样的摘要 table:
daily_table <- setDT(df)[, .(Daily_Freq = .N,
men = sum(gender == 'M'),
women = sum(gender == 'F'),
city1 = sum(city == 'city1'),
city2 = sum(city == 'city2'),
city3 = sum(city == 'city3'),
city4 = sum(city == 'city4'),
city5 = sum(city == 'city5'))
, by = .(week,Day)]
制作这个 table:
week Day Daily_Freq men women city1 city2 city3 city4 city5
one day1 1 1 0 1 0 0 0 0
two day2 2 1 1 0 1 0 1 0
two day1 1 0 1 0 0 1 0 0
但是因为我有几个城市,所以我想用一个带有他们名字的向量:
cities <- c("city1","city2","city3","city4","city5")
请注意,我的矢量中有 5 个城市,即使其中一个出现次数为零,我希望它出现在我的最终 table 中。
我该怎么做?
为了确保 R 向您显示 city5
即使没有具有该值的观测值,将其添加为因子水平:
setDT(df)
df[, city := factor(city,
levels = c("city1","city2","city3","city4","city5"))]
为了避免为 city
的每个级别编写测试,您可以遍历 city
的级别,如下所示:
daily_table <- df[, c(.(Daily_Freq = .N,
men = sum(gender == 'M'),
women = sum(gender == 'F')),
lapply(setNames(levels(city), levels(city)),
function(x) sum(city == x))),
by = .(week,Day)]
daily_table
## week Day Daily_Freq men women city1 city2 city3 city4 city5
## 1: one day1 1 1 0 1 0 0 0 0
## 2: two day2 2 1 1 0 1 0 1 0
## 3: two day1 1 0 1 0 0 1 0 0
我有一个 table 这样的:
df <- read.table(text =
" Day city gender week
'day1' 'city1' 'M' 'one'
'day2' 'city2' 'M' 'two'
'day1' 'city3' 'F' 'two'
'day2' 'city4' 'F' 'two'",
header = TRUE, stringsAsFactors = FALSE)
我正在计算这样的摘要 table:
daily_table <- setDT(df)[, .(Daily_Freq = .N,
men = sum(gender == 'M'),
women = sum(gender == 'F'),
city1 = sum(city == 'city1'),
city2 = sum(city == 'city2'),
city3 = sum(city == 'city3'),
city4 = sum(city == 'city4'),
city5 = sum(city == 'city5'))
, by = .(week,Day)]
制作这个 table:
week Day Daily_Freq men women city1 city2 city3 city4 city5
one day1 1 1 0 1 0 0 0 0
two day2 2 1 1 0 1 0 1 0
two day1 1 0 1 0 0 1 0 0
但是因为我有几个城市,所以我想用一个带有他们名字的向量:
cities <- c("city1","city2","city3","city4","city5")
请注意,我的矢量中有 5 个城市,即使其中一个出现次数为零,我希望它出现在我的最终 table 中。 我该怎么做?
为了确保 R 向您显示 city5
即使没有具有该值的观测值,将其添加为因子水平:
setDT(df)
df[, city := factor(city,
levels = c("city1","city2","city3","city4","city5"))]
为了避免为 city
的每个级别编写测试,您可以遍历 city
的级别,如下所示:
daily_table <- df[, c(.(Daily_Freq = .N,
men = sum(gender == 'M'),
women = sum(gender == 'F')),
lapply(setNames(levels(city), levels(city)),
function(x) sum(city == x))),
by = .(week,Day)]
daily_table
## week Day Daily_Freq men women city1 city2 city3 city4 city5
## 1: one day1 1 1 0 1 0 0 0 0
## 2: two day2 2 1 1 0 1 0 1 0
## 3: two day1 1 0 1 0 0 1 0 0