将 space 分隔的文本文件转换为命名向量以计算 HWE

Question

我正在处理文本文件和矢量。

我有一个 space 分隔的文本文件，格式如下：

id1 AA 44 AG 20 GG 36
id2 CC 30 CT 22 TT 48
id3 CT 60 CC 30 TT 10
...

我需要一个循环遍历每一行的代码，并将 id 放入变量中，将其余值放入向量中。第一行对应的向量示例：

x <- id1
y <- c(AA=40,AG=20,GG=36)

编辑： 我需要使用 HardyWeinberg package 中的 HWChisq 函数来排除 p 值 < 0.001 的 SNP。函数需要为每个等位基因命名的计数向量。

Answer 1

如果我们有备用列（假设我们有一个通过使用 read.csv/read.table 读取 .csv 文件在 R 中创建的对象），然后按行拆分 asplit 排除第一列'id' 列，并使用 setNames

创建一个命名向量

lst1 <- Map(setNames, asplit(df1[-1][c(FALSE, TRUE)], 1), 
         asplit(df1[-1][c(TRUE, FALSE)], 1))
names(lst1) <- df1[[1]]
lst1$id1
# AA AG GG 
# 44 20 36

数据

df1 <- structure(list(id = c("id1", "id2", "id3"), v1 = c("AA", "CC", 
"AA"), v2 = c(44L, 30L, 60L), v3 = c("AG", "CT", "AG"), v4 = c(20L, 
22L, 30L), v5 = c("GG", "TT", "GG"), v6 = c(36L, 48L, 10L)), 
class = "data.frame", row.names = c(NA, 
-3L))

Answer 2

逐行循环，然后应用 HWE 函数：

library("HardyWeinberg")

# data
df1 <- read.table(text = "
id1 AA 44 AG 20 GG 36
id2 CC 30 CT 22 TT 48
id3 CT 60 CC 30 TT 10", header = FALSE, stringsAsFactors = FALSE)

out <- apply(df1[, c(3, 5, 7)], 1, function(i){
  x <- HWChisq(setNames(i, c("AA", "AB", "BB")), verbose = FALSE)
  x$pval
})

# [1] 5.774374e-09 1.182236e-07 7.434226e-02

漂亮的输出：

cbind(df1, HWE = out)
#    V1 V2 V3 V4 V5 V6 V7          HWE
# 1 id1 AA 44 AG 20 GG 36 5.774374e-09
# 2 id2 CC 30 CT 22 TT 48 1.182236e-07
# 3 id3 CT 60 CC 30 TT 10 7.434226e-02

要计算 X 染色体的 HWE，请参阅插图：

4. Hardy-Weinberg 平衡的 X 染色体测试

Recently, Graffelman and Weir (2016) have proposed specific tests for HWE for bi-allelic markers on the X-chromosome. These tests take both males and females into account. The X-chromosomal tests can be carried out by the same functions mentioned in the previous Section (HWChisq, HWLratio, HWExact, HWPerm) and adding the argument x.linked=TRUE to the function call.

将 space 分隔的文本文件转换为命名向量以计算 HWE

Convert space separated text file into named vectors to calculate HWE

r

bioinformatics

genetics

数据