将 space 分隔的文本文件转换为命名向量以计算 HWE
Convert space separated text file into named vectors to calculate HWE
我正在处理文本文件和矢量。
我有一个 space 分隔的文本文件,格式如下:
id1 AA 44 AG 20 GG 36
id2 CC 30 CT 22 TT 48
id3 CT 60 CC 30 TT 10
...
我需要一个循环遍历每一行的代码,并将 id 放入变量中,将其余值放入向量中。第一行对应的向量示例:
x <- id1
y <- c(AA=40,AG=20,GG=36)
编辑: 我需要使用 HardyWeinberg package 中的 HWChisq 函数来排除 p 值 < 0.001 的 SNP。函数需要为每个等位基因命名的计数向量。
如果我们有备用列(假设我们有一个通过使用 read.csv/read.table
读取 .csv
文件在 R 中创建的对象),然后按行拆分 asplit
排除第一列'id' 列,并使用 setNames
创建一个命名向量
lst1 <- Map(setNames, asplit(df1[-1][c(FALSE, TRUE)], 1),
asplit(df1[-1][c(TRUE, FALSE)], 1))
names(lst1) <- df1[[1]]
lst1$id1
# AA AG GG
# 44 20 36
数据
df1 <- structure(list(id = c("id1", "id2", "id3"), v1 = c("AA", "CC",
"AA"), v2 = c(44L, 30L, 60L), v3 = c("AG", "CT", "AG"), v4 = c(20L,
22L, 30L), v5 = c("GG", "TT", "GG"), v6 = c(36L, 48L, 10L)),
class = "data.frame", row.names = c(NA,
-3L))
逐行循环,然后应用 HWE 函数:
library("HardyWeinberg")
# data
df1 <- read.table(text = "
id1 AA 44 AG 20 GG 36
id2 CC 30 CT 22 TT 48
id3 CT 60 CC 30 TT 10", header = FALSE, stringsAsFactors = FALSE)
out <- apply(df1[, c(3, 5, 7)], 1, function(i){
x <- HWChisq(setNames(i, c("AA", "AB", "BB")), verbose = FALSE)
x$pval
})
# [1] 5.774374e-09 1.182236e-07 7.434226e-02
漂亮的输出:
cbind(df1, HWE = out)
# V1 V2 V3 V4 V5 V6 V7 HWE
# 1 id1 AA 44 AG 20 GG 36 5.774374e-09
# 2 id2 CC 30 CT 22 TT 48 1.182236e-07
# 3 id3 CT 60 CC 30 TT 10 7.434226e-02
要计算 X 染色体的 HWE,请参阅插图:
Recently, Graffelman and Weir (2016) have proposed specific tests for HWE for bi-allelic markers on the X-chromosome. These tests take both males and females into account. The X-chromosomal tests can be carried out by the same functions mentioned in the previous Section (HWChisq, HWLratio, HWExact, HWPerm) and adding the argument x.linked=TRUE
to the function call.
我正在处理文本文件和矢量。
我有一个 space 分隔的文本文件,格式如下:
id1 AA 44 AG 20 GG 36
id2 CC 30 CT 22 TT 48
id3 CT 60 CC 30 TT 10
...
我需要一个循环遍历每一行的代码,并将 id 放入变量中,将其余值放入向量中。第一行对应的向量示例:
x <- id1
y <- c(AA=40,AG=20,GG=36)
编辑: 我需要使用 HardyWeinberg package 中的 HWChisq 函数来排除 p 值 < 0.001 的 SNP。函数需要为每个等位基因命名的计数向量。
如果我们有备用列(假设我们有一个通过使用 read.csv/read.table
读取 .csv
文件在 R 中创建的对象),然后按行拆分 asplit
排除第一列'id' 列,并使用 setNames
lst1 <- Map(setNames, asplit(df1[-1][c(FALSE, TRUE)], 1),
asplit(df1[-1][c(TRUE, FALSE)], 1))
names(lst1) <- df1[[1]]
lst1$id1
# AA AG GG
# 44 20 36
数据
df1 <- structure(list(id = c("id1", "id2", "id3"), v1 = c("AA", "CC",
"AA"), v2 = c(44L, 30L, 60L), v3 = c("AG", "CT", "AG"), v4 = c(20L,
22L, 30L), v5 = c("GG", "TT", "GG"), v6 = c(36L, 48L, 10L)),
class = "data.frame", row.names = c(NA,
-3L))
逐行循环,然后应用 HWE 函数:
library("HardyWeinberg")
# data
df1 <- read.table(text = "
id1 AA 44 AG 20 GG 36
id2 CC 30 CT 22 TT 48
id3 CT 60 CC 30 TT 10", header = FALSE, stringsAsFactors = FALSE)
out <- apply(df1[, c(3, 5, 7)], 1, function(i){
x <- HWChisq(setNames(i, c("AA", "AB", "BB")), verbose = FALSE)
x$pval
})
# [1] 5.774374e-09 1.182236e-07 7.434226e-02
漂亮的输出:
cbind(df1, HWE = out)
# V1 V2 V3 V4 V5 V6 V7 HWE
# 1 id1 AA 44 AG 20 GG 36 5.774374e-09
# 2 id2 CC 30 CT 22 TT 48 1.182236e-07
# 3 id3 CT 60 CC 30 TT 10 7.434226e-02
要计算 X 染色体的 HWE,请参阅插图:
Recently, Graffelman and Weir (2016) have proposed specific tests for HWE for bi-allelic markers on the X-chromosome. These tests take both males and females into account. The X-chromosomal tests can be carried out by the same functions mentioned in the previous Section (HWChisq, HWLratio, HWExact, HWPerm) and adding the argument
x.linked=TRUE
to the function call.