是否有可以找到(如计数)并给出字符串位置的 R 函数?
Is there an R function that can find (like count) and give position of string?
我正在寻找 Rstudio 中的代码,它将在 DNA 序列中搜索重组信号序列 (RSS) 并提供位置信息。
我使用了计数功能,但是它没有给我定位,而且我不能同时向其中输入多个 RSS,这使得它非常乏味。
例如:
cd247<-read.fasta("sequence.fasta")
#store DNA sequence for cd247 into cd247seq
cd247seq<-cd247[[1]]
cd247seq[1:50]
#change cd247seq from vector of characters to string
c2s(cd247seq)
#create table of all the 7 character sequences in cd247 gene
count(cd247seq,7)
cd247table<-count(cd247seq,7)
cd247table[["tactgtg"]]
产出
cd247table[["tactgtg"]]
[1] 1
但不在 cd247seq 中的位置
我已将我的文件发布到 github https://github.com/opheelorraine/Map-RSS
查看 Biostrings
包,特别是 matchPDict
函数。参见 https://kasperdanielhansen.github.io/genbioconductor/html/Biostrings_Matching.html
示例:
suppressPackageStartupMessages(library(Biostrings))
dnaseq <- DNAString("ATAGCCATGATGATTTAACCAGGTCATTT") # the sequence
motif <- DNAStringSet(c("ATG", "TGA")) # the motifs
pos <- matchPDict(motif, dnaseq)
pos
#> MIndex object of length 2
#> [[1]]
#> IRanges object with 2 ranges and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 7 9 3
#> [2] 10 12 3
#>
#> [[2]]
#> IRanges object with 2 ranges and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 8 10 3
#> [2] 11 13 3
start(pos) # ATG starts at pos 7 and 10; TGA at 8 and 11
#> IntegerList of length 2
#> [[1]] 7 10
#> [[2]] 8 11
countPDict(motif, dnaseq)
#> [1] 2 2
# search reverse complement:
matchPDict(reverseComplement(motif), dnaseq)
#> MIndex object of length 2
#> [[1]]
#> IRanges object with 2 ranges and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 6 8 3
#> [2] 25 27 3
#>
#> [[2]]
#> IRanges object with 1 range and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 24 26 3
由 reprex package (v0.3.0)
于 2020-08-05 创建
我正在寻找 Rstudio 中的代码,它将在 DNA 序列中搜索重组信号序列 (RSS) 并提供位置信息。 我使用了计数功能,但是它没有给我定位,而且我不能同时向其中输入多个 RSS,这使得它非常乏味。 例如:
cd247<-read.fasta("sequence.fasta")
#store DNA sequence for cd247 into cd247seq
cd247seq<-cd247[[1]]
cd247seq[1:50]
#change cd247seq from vector of characters to string
c2s(cd247seq)
#create table of all the 7 character sequences in cd247 gene
count(cd247seq,7)
cd247table<-count(cd247seq,7)
cd247table[["tactgtg"]]
产出
cd247table[["tactgtg"]]
[1] 1
但不在 cd247seq 中的位置
我已将我的文件发布到 github https://github.com/opheelorraine/Map-RSS
查看 Biostrings
包,特别是 matchPDict
函数。参见 https://kasperdanielhansen.github.io/genbioconductor/html/Biostrings_Matching.html
示例:
suppressPackageStartupMessages(library(Biostrings))
dnaseq <- DNAString("ATAGCCATGATGATTTAACCAGGTCATTT") # the sequence
motif <- DNAStringSet(c("ATG", "TGA")) # the motifs
pos <- matchPDict(motif, dnaseq)
pos
#> MIndex object of length 2
#> [[1]]
#> IRanges object with 2 ranges and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 7 9 3
#> [2] 10 12 3
#>
#> [[2]]
#> IRanges object with 2 ranges and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 8 10 3
#> [2] 11 13 3
start(pos) # ATG starts at pos 7 and 10; TGA at 8 and 11
#> IntegerList of length 2
#> [[1]] 7 10
#> [[2]] 8 11
countPDict(motif, dnaseq)
#> [1] 2 2
# search reverse complement:
matchPDict(reverseComplement(motif), dnaseq)
#> MIndex object of length 2
#> [[1]]
#> IRanges object with 2 ranges and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 6 8 3
#> [2] 25 27 3
#>
#> [[2]]
#> IRanges object with 1 range and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 24 26 3
由 reprex package (v0.3.0)
于 2020-08-05 创建