如何通过R中的其他变量将一列分成多列
How to divide one column in to multiple columns by other variables in R
我有一个数据集
data
Choice Length Gender
1 I subadults M
2 F subadults M
3 F subadults M
4 F subadults M
5 I subadults M
6 F subadults M
7 I subadults M
8 F subadults M
9 I subadults M
10 I subadults M
11 I subadults M
12 O subadults M
13 O subadults M
14 I subadults M
15 F subadults M
16 F subadults M
17 I subadults M
18 O subadults M
19 F subadults M
20 O subadults M
21 F subadults M
22 F adults M
23 I adults M
24 F adults M
25 I adults M
26 F adults M
27 F adults M
28 F adults M
29 F adults M
30 F adults M
31 O adults M
32 O adults M
33 F adults F
34 F adults F
35 F adults F
36 F adults F
37 O adults F
38 F adults F
39 F adults F
40 I subadults F
41 I subadults F
42 I subadults F
43 O subadults F
44 I subadults F
45 I subadults F
46 I subadults F
47 F subadults F
48 I subadults F
49 O subadults F
50 I subadults F
51 I adults F
52 F adults F
53 F adults F
54 F adults F
55 F adults F
现在我想将 Choice 列分成三部分,因此数据集将像:
F I O Length Gender
1 0 20 subadults F
0 10 0 adults F
12 0 11 subadults M
0 10 0 adults M
其中F、I、O是长度和性别之和。
我找不到执行此操作的 R 命令。有没有人可以帮助我?
太感谢了!颜
你可以试试:
reshape(as.data.frame(table(df)),
idvar=c("Length","Gender"),
timevar="Choice",direction="wide")
# Length Gender Freq.F Freq.I Freq.O
#1 adults F 10 1 1
#4 subadults F 1 8 2
#7 adults M 7 2 2
#10 subadults M 9 8 4
函数table
给出了每个Choice
、Gender
和Length
作为多维数组出现的次数。然后,您强制转换为具有 4 列的 data.frame
(上面的三列加上一个名为 Freq
的列,该列指示每个案例的出现次数),然后根据需要重塑结果。
编辑
我现在意识到我没有理解您的价值观。这里我统计了每个case出现的次数。你的价值观正确吗?如果是这样,您如何得出这些值?
尝试:
require(reshape2)
data <- data.frame(choice = c('I', 'F', 'I', 'O', 'F', 'O'),
length = c('subadults', 'subadults', 'subadults', 'adults', 'adults', 'adults'),
gender = c('M', 'M', 'F', 'F', 'M', 'F'))
melt_data = melt(data, value.name = "value", id.vars = c("length", "gender"))
dcast(melt_data, gender+length ~ value)
gender length F I O
1 F adults 0 0 2
2 F subadults 0 1 0
3 M adults 1 0 0
4 M subadults 1 1 0
在 base R 中,要考虑的两种方法是 ftable
和 aggregate
。
这是ftable
:
> ftable(mydf, col.vars = "Choice")
Choice F I O
Length Gender
adults F 10 1 1
M 7 2 2
subadults F 1 8 2
M 9 8 4
这里是aggregate
:
> aggregate(Choice ~ Length + Gender, mydf, table)
Length Gender Choice.F Choice.I Choice.O
1 adults F 10 1 1
2 subadults F 1 8 2
3 adults M 7 2 2
4 subadults M 9 8 4
使用"data.table",您还可以尝试以下操作:
as.data.table(mydf)[, as.list(table(Choice)), by = list(Length, Gender)]
# Length Gender F I O
# 1: subadults M 9 8 4
# 2: adults M 7 2 2
# 3: adults F 10 1 1
# 4: subadults F 1 8 2
但是,dcast.data.table
将是更常见的方法:
dcast.data.table(as.data.table(mydf), Length + Gender ~ Choice, value.var = "Choice")
使用"dplyr"和"tidyr",你可以试试:
library(dplyr)
library(tidyr)
mydf %>%
group_by(Length, Gender, Choice) %>%
summarise(Count = n()) %>%
spread(Choice, Count)
# Source: local data frame [4 x 5]
#
# Length Gender F I O
# 1 adults F 10 1 1
# 2 adults M 7 2 2
# 3 subadults F 1 8 2
# 4 subadults M 9 8 4
我有一个数据集
data
Choice Length Gender
1 I subadults M
2 F subadults M
3 F subadults M
4 F subadults M
5 I subadults M
6 F subadults M
7 I subadults M
8 F subadults M
9 I subadults M
10 I subadults M
11 I subadults M
12 O subadults M
13 O subadults M
14 I subadults M
15 F subadults M
16 F subadults M
17 I subadults M
18 O subadults M
19 F subadults M
20 O subadults M
21 F subadults M
22 F adults M
23 I adults M
24 F adults M
25 I adults M
26 F adults M
27 F adults M
28 F adults M
29 F adults M
30 F adults M
31 O adults M
32 O adults M
33 F adults F
34 F adults F
35 F adults F
36 F adults F
37 O adults F
38 F adults F
39 F adults F
40 I subadults F
41 I subadults F
42 I subadults F
43 O subadults F
44 I subadults F
45 I subadults F
46 I subadults F
47 F subadults F
48 I subadults F
49 O subadults F
50 I subadults F
51 I adults F
52 F adults F
53 F adults F
54 F adults F
55 F adults F
现在我想将 Choice 列分成三部分,因此数据集将像:
F I O Length Gender
1 0 20 subadults F
0 10 0 adults F
12 0 11 subadults M
0 10 0 adults M
其中F、I、O是长度和性别之和。
我找不到执行此操作的 R 命令。有没有人可以帮助我? 太感谢了!颜
你可以试试:
reshape(as.data.frame(table(df)),
idvar=c("Length","Gender"),
timevar="Choice",direction="wide")
# Length Gender Freq.F Freq.I Freq.O
#1 adults F 10 1 1
#4 subadults F 1 8 2
#7 adults M 7 2 2
#10 subadults M 9 8 4
函数table
给出了每个Choice
、Gender
和Length
作为多维数组出现的次数。然后,您强制转换为具有 4 列的 data.frame
(上面的三列加上一个名为 Freq
的列,该列指示每个案例的出现次数),然后根据需要重塑结果。
编辑
我现在意识到我没有理解您的价值观。这里我统计了每个case出现的次数。你的价值观正确吗?如果是这样,您如何得出这些值?
尝试:
require(reshape2)
data <- data.frame(choice = c('I', 'F', 'I', 'O', 'F', 'O'),
length = c('subadults', 'subadults', 'subadults', 'adults', 'adults', 'adults'),
gender = c('M', 'M', 'F', 'F', 'M', 'F'))
melt_data = melt(data, value.name = "value", id.vars = c("length", "gender"))
dcast(melt_data, gender+length ~ value)
gender length F I O
1 F adults 0 0 2
2 F subadults 0 1 0
3 M adults 1 0 0
4 M subadults 1 1 0
在 base R 中,要考虑的两种方法是 ftable
和 aggregate
。
这是ftable
:
> ftable(mydf, col.vars = "Choice")
Choice F I O
Length Gender
adults F 10 1 1
M 7 2 2
subadults F 1 8 2
M 9 8 4
这里是aggregate
:
> aggregate(Choice ~ Length + Gender, mydf, table)
Length Gender Choice.F Choice.I Choice.O
1 adults F 10 1 1
2 subadults F 1 8 2
3 adults M 7 2 2
4 subadults M 9 8 4
使用"data.table",您还可以尝试以下操作:
as.data.table(mydf)[, as.list(table(Choice)), by = list(Length, Gender)]
# Length Gender F I O
# 1: subadults M 9 8 4
# 2: adults M 7 2 2
# 3: adults F 10 1 1
# 4: subadults F 1 8 2
但是,dcast.data.table
将是更常见的方法:
dcast.data.table(as.data.table(mydf), Length + Gender ~ Choice, value.var = "Choice")
使用"dplyr"和"tidyr",你可以试试:
library(dplyr)
library(tidyr)
mydf %>%
group_by(Length, Gender, Choice) %>%
summarise(Count = n()) %>%
spread(Choice, Count)
# Source: local data frame [4 x 5]
#
# Length Gender F I O
# 1 adults F 10 1 1
# 2 adults M 7 2 2
# 3 subadults F 1 8 2
# 4 subadults M 9 8 4