如何应用带有 proxy::dist 的自定义函数在 R 中创建距离矩阵
How to apply a custom function with proxy::dist to create a distance matrix in R
我已经定义了一个自定义函数并测试了该函数以确保它可以正常工作,但我无法将它应用于列表以获得距离矩阵。
我的密码是:
library(Biostrings)
library(proxy)
#import the sequences using Biostrings
indf<-readAAStringSet("C:/Users/jamie/OneDrive/Documents/Junk/SAMPLEFASTA.fasta")
#Assign the names and sequences to different variables
seqAAname<-names(indf)
seqz<-paste(indf)
#Put just the sequences into a dataframe
indf2<-data.frame(seqz)
#Convert the sequences into a list
indf3<-as.list(indf2)
#Define a custom function to return the alignment score between two sequences (pairwise)
customalnfunc <- function(X, Y){
pairwiseAlignment(X, Y,
substitutionMatrix = "BLOSUM45", gapOpening = 1, gapExtension = 3)
}
#Test the function but not as a function (This works fine)
testfreefunc<- pairwiseAlignment(AAString("PEHQRSTVE"),AAString("PQHQRETVE"),
substitutionMatrix = "BLOSUM45", gapOpening = 1, gapExtension = 3)
print(testfreefunc@score)
#Test the function as a fucntion to make sure it works (This works fine)
testfuncout <- customalnfunc(AAString("PEHQRSTVE"),AAString("PQHQRETVE"))
print(testfuncout@score)
#Apply the custom function to all possible pairs using proxy::dist with the custom function (This does not work, it returns 0)
outalnmatrix <- proxy::dist(indf3, method = customalnfunc)
outalnmatrix
SAMPLEFASTA.fasta 文件包含:
>SeqA
PEHQRSTVE
>SeqB
PQHQRETVE
>SeqC
RQHERSEVE
outalnmatrix 的期望输出是:
我尝试将输入数据作为列表和矩阵传递给 proxy::dist。
我怎样才能使这个工作?
您不需要使用 proxy
包,因为 proxy::dist
用于比较 matrix/dataframes 的行。既然要比较字符串,可以用outer
。但是,您需要调整 customalnfunc
函数,使其 returns 只有一个数字 (scoreOnly = TRUE
).
library(Biostrings)
seqz <- c("PEHQRSTVE", "PQHQRETVE", "RQHERSEVE")
customalnfunc <- function(X, Y){
pairwiseAlignment(X, Y,
substitutionMatrix = "BLOSUM45",
gapOpening = 1,
gapExtension = 3,
scoreOnly = TRUE)
}
outer(seqz, seqz, customalnfunc)
#>
[,1] [,2] [,3]
[1,] 58 50 33
[2,] 50 60 33
[3,] 33 33 57
我已经定义了一个自定义函数并测试了该函数以确保它可以正常工作,但我无法将它应用于列表以获得距离矩阵。
我的密码是:
library(Biostrings)
library(proxy)
#import the sequences using Biostrings
indf<-readAAStringSet("C:/Users/jamie/OneDrive/Documents/Junk/SAMPLEFASTA.fasta")
#Assign the names and sequences to different variables
seqAAname<-names(indf)
seqz<-paste(indf)
#Put just the sequences into a dataframe
indf2<-data.frame(seqz)
#Convert the sequences into a list
indf3<-as.list(indf2)
#Define a custom function to return the alignment score between two sequences (pairwise)
customalnfunc <- function(X, Y){
pairwiseAlignment(X, Y,
substitutionMatrix = "BLOSUM45", gapOpening = 1, gapExtension = 3)
}
#Test the function but not as a function (This works fine)
testfreefunc<- pairwiseAlignment(AAString("PEHQRSTVE"),AAString("PQHQRETVE"),
substitutionMatrix = "BLOSUM45", gapOpening = 1, gapExtension = 3)
print(testfreefunc@score)
#Test the function as a fucntion to make sure it works (This works fine)
testfuncout <- customalnfunc(AAString("PEHQRSTVE"),AAString("PQHQRETVE"))
print(testfuncout@score)
#Apply the custom function to all possible pairs using proxy::dist with the custom function (This does not work, it returns 0)
outalnmatrix <- proxy::dist(indf3, method = customalnfunc)
outalnmatrix
SAMPLEFASTA.fasta 文件包含:
>SeqA
PEHQRSTVE
>SeqB
PQHQRETVE
>SeqC
RQHERSEVE
outalnmatrix 的期望输出是:
我尝试将输入数据作为列表和矩阵传递给 proxy::dist。
我怎样才能使这个工作?
您不需要使用 proxy
包,因为 proxy::dist
用于比较 matrix/dataframes 的行。既然要比较字符串,可以用outer
。但是,您需要调整 customalnfunc
函数,使其 returns 只有一个数字 (scoreOnly = TRUE
).
library(Biostrings)
seqz <- c("PEHQRSTVE", "PQHQRETVE", "RQHERSEVE")
customalnfunc <- function(X, Y){
pairwiseAlignment(X, Y,
substitutionMatrix = "BLOSUM45",
gapOpening = 1,
gapExtension = 3,
scoreOnly = TRUE)
}
outer(seqz, seqz, customalnfunc)
#>
[,1] [,2] [,3]
[1,] 58 50 33
[2,] 50 60 33
[3,] 33 33 57