重新调整因子以方便在 R 中的 DESeq2 模型中用作嵌套因子
Releveling factor to facilitate use as nested factor in DESeq2 model in R
我正在使用 DESeq2 包拟合 GLM,并且遇到个体 (RatID) 嵌套在处理 (Diet) 中的情况。该软件包的作者建议在每个饮食中将个体从 1:N 重新分级(其中 N 是特定饮食中 RatID 的数量)而不是他们原来的 ID/factor 级别(DESeq2 vignette, page 35.)
数据看起来像这样(实际上有更多的列和行,但为简单起见省略):
Diet Extraction RatID
199 HAMSP 8 65
74 HAMS 9 108
308 HAMS 18 100
41 HAMSA 3 83
88 HAMSP 12 11
221 HAMSP 14 66
200 HAMSA 8 57
155 HAMSB 1 105
245 HAMSB 19 50
254 HAMS 21 90
182 HAMSB 4 4
283 HAMSA 23 59
180 HAMSP 4 22
71 HAMSP 9 112
212 HAMS 12 63
220 HAMSP 14 54
56 HAMS 7 81
274 HAMSP 1 11
114 HAMS 17 102
143 HAMSP 22 93
这是结构的 dput()
输出:
data = structure(list(Diet = structure(c(4L, 1L, 1L, 2L, 4L, 4L, 2L,
3L, 3L, 1L, 3L, 2L, 4L, 4L, 1L, 4L, 1L, 4L, 1L, 4L), .Label = c("HAMS",
"HAMSA", "HAMSB", "HAMSP", "LAMS"), class = "factor"), Extraction = c(8L,
9L, 18L, 3L, 12L, 14L, 8L, 1L, 19L, 21L, 4L, 23L, 4L, 9L, 12L,
14L, 7L, 1L, 17L, 22L), RatID = structure(c(61L, 7L, 3L, 76L,
9L, 62L, 52L, 6L, 46L, 81L, 37L, 54L, 20L, 12L, 59L, 50L, 74L,
9L, 4L, 84L), .Label = c("1", "10", "100", "102", "103", "105",
"108", "109", "11", "110", "111", "112", "113", "13", "14", "16",
"17", "18", "20", "22", "23", "24", "25", "26", "27", "28", "29",
"3", "30", "31", "32", "34", "35", "36", "37", "39", "4", "40",
"42", "43", "45", "46", "48", "49", "5", "50", "51", "52", "53",
"54", "55", "57", "58", "59", "6", "60", "61", "62", "63", "64",
"65", "66", "67", "68", "69", "70", "71", "73", "77", "78", "79",
"8", "80", "81", "82", "83", "85", "86", "88", "89", "90", "91",
"92", "93", "94", "95", "96", "98", "99"), class = "factor")), .Names = c("Diet",
"Extraction", "RatID"), row.names = c(199L, 74L, 308L, 41L, 88L,
221L, 200L, 155L, 245L, 254L, 182L, 283L, 180L, 71L, 212L, 220L,
56L, 274L, 114L, 143L), class = "data.frame")
有人可以指定一种优雅的方式来为 Diet 中的 RatID 生成新的因子水平作为上述的附加列 data.frame。
这可以用 data.table 的 roll 函数来完成吗?
期望的输出(手动完成):
Diet Extraction RatID newCol
1 HAMSP 8 65 1
2 HAMS 9 108 1
3 HAMS 18 100 2
4 HAMSA 3 83 1
5 HAMSP 12 11 2
6 HAMSP 14 66 3
7 HAMSA 8 57 2
8 HAMSB 1 105 1
9 HAMSB 19 50 2
10 HAMS 21 90 3
11 HAMSB 4 4 3
12 HAMSA 23 59 3
13 HAMSP 4 22 4
14 HAMSP 9 112 5
15 HAMS 12 63 4
16 HAMSP 14 54 6
17 HAMS 7 81 5
18 HAMSP 1 11 2
19 HAMS 17 102 6
20 HAMSP 22 93 7
注意:每个处理中的大鼠数量不同。我还希望解决方案不对数据中的行重新排序(如果可能)。
编辑:RatID 没有 'natural' 顺序,只要在饮食中有 1:1 映射就可以了。
这是 dplyr
中实施的 as.numeric(factor(.))
技巧:
require(dplyr)
data %>% group_by(Diet) %>% mutate(RatIDByDiet=as.numeric(factor(RatID)))
## Source: local data frame [20 x 4]
## Groups: Diet
##
## Diet Extraction RatID RatIDByDiet
## 1 HAMSP 8 65 5
## 2 HAMS 9 108 3
## 3 HAMS 18 100 1
## 4 HAMSA 3 83 3
## 5 HAMSP 12 11 1
## 6 HAMSP 14 66 6
## 7 HAMSA 8 57 1
## 8 HAMSB 1 105 1
## 9 HAMSB 19 50 3
## 10 HAMS 21 90 6
## 11 HAMSB 4 4 2
## 12 HAMSA 23 59 2
## 13 HAMSP 4 22 3
## 14 HAMSP 9 112 2
## 15 HAMS 12 63 4
## 16 HAMSP 14 54 4
## 17 HAMS 7 81 5
## 18 HAMSP 1 11 1
## 19 HAMS 17 102 2
## 20 HAMSP 22 93 7
这里有一个解决方案,可以避免通过 factor()
,如果您想更好地控制编号的发生方式:
data %>% group_by(Diet) %>% mutate(RatIDByDiet=match(RatID, unique(RatID)))
## Source: local data frame [20 x 4]
## Groups: Diet
##
## Diet Extraction RatID RatIDByDiet
## 1 HAMSP 8 65 1
## 2 HAMS 9 108 1
## 3 HAMS 18 100 2
## 4 HAMSA 3 83 1
## 5 HAMSP 12 11 2
## 6 HAMSP 14 66 3
## 7 HAMSA 8 57 2
## 8 HAMSB 1 105 1
## 9 HAMSB 19 50 2
## 10 HAMS 21 90 3
## 11 HAMSB 4 4 3
## 12 HAMSA 23 59 3
## 13 HAMSP 4 22 4
## 14 HAMSP 9 112 5
## 15 HAMS 12 63 4
## 16 HAMSP 14 54 6
## 17 HAMS 7 81 5
## 18 HAMSP 1 11 2
## 19 HAMS 17 102 6
## 20 HAMSP 22 93 7
您可以将 'RatID' 转换为 'factor' 并将其强制转换回 'numeric'
library(data.table)#v1.9.4+
setDT(data)[, newCol:=as.numeric(factor(RatID,
levels=unique(RatID))), Diet]
# Diet Extraction RatID newCol
# 1: HAMSP 8 65 1
# 2: HAMS 9 108 1
# 3: HAMS 18 100 2
# 4: HAMSA 3 83 1
# 5: HAMSP 12 11 2
# 6: HAMSP 14 66 3
# 7: HAMSA 8 57 2
# 8: HAMSB 1 105 1
# 9: HAMSB 19 50 2
#10: HAMS 21 90 3
#11: HAMSB 4 4 3
#12: HAMSA 23 59 3
#13: HAMSP 4 22 4
#14: HAMSP 9 112 5
#15: HAMS 12 63 4
#16: HAMSP 14 54 6
#17: HAMS 7 81 5
#18: HAMSP 1 11 2
#19: HAMS 17 102 6
#20: HAMSP 22 93 7
或使用match
setDT(data)[, newCol:=match(RatID, unique(RatID)), Diet]
或与 base R
类似的选项
data$newCol <- with(data, ave(as.numeric(levels(RatID))[RatID],
Diet, FUN=function(x) match(x, unique(x))))
我正在使用 DESeq2 包拟合 GLM,并且遇到个体 (RatID) 嵌套在处理 (Diet) 中的情况。该软件包的作者建议在每个饮食中将个体从 1:N 重新分级(其中 N 是特定饮食中 RatID 的数量)而不是他们原来的 ID/factor 级别(DESeq2 vignette, page 35.)
数据看起来像这样(实际上有更多的列和行,但为简单起见省略):
Diet Extraction RatID
199 HAMSP 8 65
74 HAMS 9 108
308 HAMS 18 100
41 HAMSA 3 83
88 HAMSP 12 11
221 HAMSP 14 66
200 HAMSA 8 57
155 HAMSB 1 105
245 HAMSB 19 50
254 HAMS 21 90
182 HAMSB 4 4
283 HAMSA 23 59
180 HAMSP 4 22
71 HAMSP 9 112
212 HAMS 12 63
220 HAMSP 14 54
56 HAMS 7 81
274 HAMSP 1 11
114 HAMS 17 102
143 HAMSP 22 93
这是结构的 dput()
输出:
data = structure(list(Diet = structure(c(4L, 1L, 1L, 2L, 4L, 4L, 2L,
3L, 3L, 1L, 3L, 2L, 4L, 4L, 1L, 4L, 1L, 4L, 1L, 4L), .Label = c("HAMS",
"HAMSA", "HAMSB", "HAMSP", "LAMS"), class = "factor"), Extraction = c(8L,
9L, 18L, 3L, 12L, 14L, 8L, 1L, 19L, 21L, 4L, 23L, 4L, 9L, 12L,
14L, 7L, 1L, 17L, 22L), RatID = structure(c(61L, 7L, 3L, 76L,
9L, 62L, 52L, 6L, 46L, 81L, 37L, 54L, 20L, 12L, 59L, 50L, 74L,
9L, 4L, 84L), .Label = c("1", "10", "100", "102", "103", "105",
"108", "109", "11", "110", "111", "112", "113", "13", "14", "16",
"17", "18", "20", "22", "23", "24", "25", "26", "27", "28", "29",
"3", "30", "31", "32", "34", "35", "36", "37", "39", "4", "40",
"42", "43", "45", "46", "48", "49", "5", "50", "51", "52", "53",
"54", "55", "57", "58", "59", "6", "60", "61", "62", "63", "64",
"65", "66", "67", "68", "69", "70", "71", "73", "77", "78", "79",
"8", "80", "81", "82", "83", "85", "86", "88", "89", "90", "91",
"92", "93", "94", "95", "96", "98", "99"), class = "factor")), .Names = c("Diet",
"Extraction", "RatID"), row.names = c(199L, 74L, 308L, 41L, 88L,
221L, 200L, 155L, 245L, 254L, 182L, 283L, 180L, 71L, 212L, 220L,
56L, 274L, 114L, 143L), class = "data.frame")
有人可以指定一种优雅的方式来为 Diet 中的 RatID 生成新的因子水平作为上述的附加列 data.frame。 这可以用 data.table 的 roll 函数来完成吗?
期望的输出(手动完成):
Diet Extraction RatID newCol
1 HAMSP 8 65 1
2 HAMS 9 108 1
3 HAMS 18 100 2
4 HAMSA 3 83 1
5 HAMSP 12 11 2
6 HAMSP 14 66 3
7 HAMSA 8 57 2
8 HAMSB 1 105 1
9 HAMSB 19 50 2
10 HAMS 21 90 3
11 HAMSB 4 4 3
12 HAMSA 23 59 3
13 HAMSP 4 22 4
14 HAMSP 9 112 5
15 HAMS 12 63 4
16 HAMSP 14 54 6
17 HAMS 7 81 5
18 HAMSP 1 11 2
19 HAMS 17 102 6
20 HAMSP 22 93 7
注意:每个处理中的大鼠数量不同。我还希望解决方案不对数据中的行重新排序(如果可能)。
编辑:RatID 没有 'natural' 顺序,只要在饮食中有 1:1 映射就可以了。
这是 dplyr
中实施的 as.numeric(factor(.))
技巧:
require(dplyr)
data %>% group_by(Diet) %>% mutate(RatIDByDiet=as.numeric(factor(RatID)))
## Source: local data frame [20 x 4]
## Groups: Diet
##
## Diet Extraction RatID RatIDByDiet
## 1 HAMSP 8 65 5
## 2 HAMS 9 108 3
## 3 HAMS 18 100 1
## 4 HAMSA 3 83 3
## 5 HAMSP 12 11 1
## 6 HAMSP 14 66 6
## 7 HAMSA 8 57 1
## 8 HAMSB 1 105 1
## 9 HAMSB 19 50 3
## 10 HAMS 21 90 6
## 11 HAMSB 4 4 2
## 12 HAMSA 23 59 2
## 13 HAMSP 4 22 3
## 14 HAMSP 9 112 2
## 15 HAMS 12 63 4
## 16 HAMSP 14 54 4
## 17 HAMS 7 81 5
## 18 HAMSP 1 11 1
## 19 HAMS 17 102 2
## 20 HAMSP 22 93 7
这里有一个解决方案,可以避免通过 factor()
,如果您想更好地控制编号的发生方式:
data %>% group_by(Diet) %>% mutate(RatIDByDiet=match(RatID, unique(RatID)))
## Source: local data frame [20 x 4]
## Groups: Diet
##
## Diet Extraction RatID RatIDByDiet
## 1 HAMSP 8 65 1
## 2 HAMS 9 108 1
## 3 HAMS 18 100 2
## 4 HAMSA 3 83 1
## 5 HAMSP 12 11 2
## 6 HAMSP 14 66 3
## 7 HAMSA 8 57 2
## 8 HAMSB 1 105 1
## 9 HAMSB 19 50 2
## 10 HAMS 21 90 3
## 11 HAMSB 4 4 3
## 12 HAMSA 23 59 3
## 13 HAMSP 4 22 4
## 14 HAMSP 9 112 5
## 15 HAMS 12 63 4
## 16 HAMSP 14 54 6
## 17 HAMS 7 81 5
## 18 HAMSP 1 11 2
## 19 HAMS 17 102 6
## 20 HAMSP 22 93 7
您可以将 'RatID' 转换为 'factor' 并将其强制转换回 'numeric'
library(data.table)#v1.9.4+
setDT(data)[, newCol:=as.numeric(factor(RatID,
levels=unique(RatID))), Diet]
# Diet Extraction RatID newCol
# 1: HAMSP 8 65 1
# 2: HAMS 9 108 1
# 3: HAMS 18 100 2
# 4: HAMSA 3 83 1
# 5: HAMSP 12 11 2
# 6: HAMSP 14 66 3
# 7: HAMSA 8 57 2
# 8: HAMSB 1 105 1
# 9: HAMSB 19 50 2
#10: HAMS 21 90 3
#11: HAMSB 4 4 3
#12: HAMSA 23 59 3
#13: HAMSP 4 22 4
#14: HAMSP 9 112 5
#15: HAMS 12 63 4
#16: HAMSP 14 54 6
#17: HAMS 7 81 5
#18: HAMSP 1 11 2
#19: HAMS 17 102 6
#20: HAMSP 22 93 7
或使用match
setDT(data)[, newCol:=match(RatID, unique(RatID)), Diet]
或与 base R
data$newCol <- with(data, ave(as.numeric(levels(RatID))[RatID],
Diet, FUN=function(x) match(x, unique(x))))