Rosalind:SUBS 未通过给定案例
Rosalind: SUBS failing the given case
我写了一个 this challenge based on this answer 的解决方案。它成功处理了给出的示例案例,但不是实际案例。
挑战:
Given two strings s
and t
, t
is a substring of s
if t
is contained as a contiguous collection of symbols in s
(as a result, t
must be no longer than s
).
The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18). The symbol at position i
of s
is denoted by s[i]
.
A substring of s
can be represented as s[j:k]
, where j
and k
represent the starting and ending positions of the substring in s
; for example, if s = "AUGCUUCAGAAAGGUCUUACG"
, then s[2:5] = "UGCU"
.
The location of a substring s[j:k]
is its beginning position j
; note that t
will have multiple locations in s
if it occurs more than once as a substring of s
(see the Sample below).
给定:
Two DNA strings s
and t
(each of length at most 1 kbp).
Return:
All locations of t
as a substring of s
.
示例数据集:
GATATATGCATATACTT
ATAT
示例输出:
2 4 10
对于示例,它有效。当然,您需要手动 trim 格式化,但这只是几秒钟的工作。
不接受实际数据和我生成的输出。
实际数据集:
CAAATAGTCACACAATAGTCGGCTAAATAGTCAATAGTCAAATAGTCAGAGCTAATAGTCTAAATAGTCGAAAAATAGTCATCAATAGTCTAAATAGTCAATAGTCGGAATAGTCAAAATAGTCAATAGTCAATAGTCAATAGTCGACTAAATAGTCCCAATAGTCTCAGAAATAGTCAATAGTCGTAATAGTCAATAGTCTAATAGTCTAATAGTCCAATAGTCTGTCAAATAGTCAATAGTCCAATAGTCGTTTAATAGTCCCCTTTACCAATAGTCAATAGTCCGAATAGTCAGGAATAGTCAGCACTAATAGTCAATAGTCCTAATAGTCCCAATAGTCAAAATAGTCAATAGTCTAAATAATAGTCCTAGCAGAAGAATAGTCTAATAGTCGGCAATAGTCAATAGTCAAATAGTCAGAATAGTCAAATAGTCGAAATAGTCAATAGTCAATAGTCAAATAGTCAAATAGTCAATAGTCAAATAGTCAAATAGTCAAATAGTCGAATAGTCTGTAATAGTCAATAGTCCTTCAATAGTCTAATAGTCATTCAATAGTCAAGAAATAGTCGGGGGAATAGTCCGAATAGTCAAATAGTCAATAGTCGAATAGTCTAATAGTCAATAGTCTAATAGTCTGATAATAGTCAAATAGTCAATAGTCTAAATAGTCGCCTATGCCAATAGTCTTATCAAATAGTCTCTTAATAGTCTAATAGTCAATAGTCAATAGTCTAATAGTCATAATAGTCAATAGTCAAGGAATAGTCCCATAATAGTCAATAGTCTTAATAGTCCAAACGAAATAGTCTTAATAGTCCCTAATAGTCACTAATAGTCGTAATAGTCATAATAGTCCAATAGTCTAAATAGTCTGCAATAGTCAAATAGTCAAATAGTCCGTACAATAGTCTTAATAGTCTTTGCGGCTCAATAGTCTCATAATAGTC
AATAGTCAA
实际输出(trimmed):
26 33 93 109 118 125 132
代码:
def find_substring_locations(long, short)
mpos = []
re = Regexp.new(short)
m = i = 0
m = re.match( long, i ) { |k| j = k.begin(0); i = j + 1; mpos << j } while m
return mpos
end
def plus_one(input)
arr = []
for i in input
arr << (i += 1)
end
arr
end
main_string = gets.chomp
sub_string = gets.chomp
plus_one(find_substring_locations(main_string, sub_string))
我哪里做错了?它似乎是有序的。
编辑:
原来问题出在环境中。重启后问题无法重现。
我认为你的代码是正确的,除了一个错误。 plus_one
解决方案可以简化。
m = re.match( long, i ) { |k| j = k.begin(0); i = j + 1; mpos << j + 1} while m
但是有一种更容易实现的方法。
您不需要正则表达式,有一种更简单的方法来搜索子字符串匹配的索引:
String#index
接受一个附加参数,起始索引。
input = "CAAATAGTCACACAATAGTCGGCTAAATAGTCAATAGTCAAATAGTCAGAGCTAATAGTCTAAATAGTCGAAAAATAGTCATCAATAGTCTAAATAGTCAATAGTCGGAATAGTCAAAATAGTCAATAGTCAATAGTCAATAGTCGACTAAATAGTCCCAATAGTCTCAGAAATAGTCAATAGTCGTAATAGTCAATAGTCTAATAGTCTAATAGTCCAATAGTCTGTCAAATAGTCAATAGTCCAATAGTCGTTTAATAGTCCCCTTTACCAATAGTCAATAGTCCGAATAGTCAGGAATAGTCAGCACTAATAGTCAATAGTCCTAATAGTCCCAATAGTCAAAATAGTCAATAGTCTAAATAATAGTCCTAGCAGAAGAATAGTCTAATAGTCGGCAATAGTCAATAGTCAAATAGTCAGAATAGTCAAATAGTCGAAATAGTCAATAGTCAATAGTCAAATAGTCAAATAGTCAATAGTCAAATAGTCAAATAGTCAAATAGTCGAATAGTCTGTAATAGTCAATAGTCCTTCAATAGTCTAATAGTCATTCAATAGTCAAGAAATAGTCGGGGGAATAGTCCGAATAGTCAAATAGTCAATAGTCGAATAGTCTAATAGTCAATAGTCTAATAGTCTGATAATAGTCAAATAGTCAATAGTCTAAATAGTCGCCTATGCCAATAGTCTTATCAAATAGTCTCTTAATAGTCTAATAGTCAATAGTCAATAGTCTAATAGTCATAATAGTCAATAGTCAAGGAATAGTCCCATAATAGTCAATAGTCTTAATAGTCCAAACGAAATAGTCTTAATAGTCCCTAATAGTCACTAATAGTCGTAATAGTCATAATAGTCCAATAGTCTAAATAGTCTGCAATAGTCAAATAGTCAAATAGTCCGTACAATAGTCTTAATAGTCTTTGCGGCTCAATAGTCTCATAATAGTC"
query = "AATAGTCAA"
def substring_index(input, query)
last = 0
matches = []
while index = input.index(query, last) do
matches << index += 1
last = index
end
matches
end
p substring_index(input, query)
# => [26, 33, 93, 109, 118, 125, 132, 172, 188, 231, 273, 312, 337, 346, 400, 407, 424, 441, 448, 455, 463, 471, 478, 486, 494, 520, 557, 589, 597, 620, 646, 654, 718, 725, 749, 756, 778, 882, 890]
实际上不是答案,但您的代码按预期工作:
s = 'CAAATAGTCACACAATAGTCGGCTAAATAGTCAATAGTCAAATAGTCAGAGCTAATAGTCTAAA'\
'TAGTCGAAAAATAGTCATCAATAGTCTAAATAGTCAATAGTCGGAATAGTCAAAATAGTCAATA'\
'GTCAATAGTCAATAGTCGACTAAATAGTCCCAATAGTCTCAGAAATAGTCAATAGTCGTAATAG'\
'TCAATAGTCTAATAGTCTAATAGTCCAATAGTCTGTCAAATAGTCAATAGTCCAATAGTCGTTT'\
'AATAGTCCCCTTTACCAATAGTCAATAGTCCGAATAGTCAGGAATAGTCAGCACTAATAGTCAA'\
'TAGTCCTAATAGTCCCAATAGTCAAAATAGTCAATAGTCTAAATAATAGTCCTAGCAGAAGAAT'\
'AGTCTAATAGTCGGCAATAGTCAATAGTCAAATAGTCAGAATAGTCAAATAGTCGAAATAGTCA'\
'ATAGTCAATAGTCAAATAGTCAAATAGTCAATAGTCAAATAGTCAAATAGTCAAATAGTCGAAT'\
'AGTCTGTAATAGTCAATAGTCCTTCAATAGTCTAATAGTCATTCAATAGTCAAGAAATAGTCGG'\
'GGGAATAGTCCGAATAGTCAAATAGTCAATAGTCGAATAGTCTAATAGTCAATAGTCTAATAGT'\
'CTGATAATAGTCAAATAGTCAATAGTCTAAATAGTCGCCTATGCCAATAGTCTTATCAAATAGT'\
'CTCTTAATAGTCTAATAGTCAATAGTCAATAGTCTAATAGTCATAATAGTCAATAGTCAAGGAA'\
'TAGTCCCATAATAGTCAATAGTCTTAATAGTCCAAACGAAATAGTCTTAATAGTCCCTAATAGT'\
'CACTAATAGTCGTAATAGTCATAATAGTCCAATAGTCTAAATAGTCTGCAATAGTCAAATAGTC'\
'AAATAGTCCGTACAATAGTCTTAATAGTCTTTGCGGCTCAATAGTCTCATAATAGTC'
t = 'AATAGTCAA'
plus_one(find_substring_locations(s, t))
#=> [26, 33, 93, 109, 118, 125, 132, 172, 188, 231, 273, 312, 337, 346,
# 400, 407, 424, 441, 448, 455, 463, 471, 478, 486, 494, 520, 557,
# 589, 597, 620, 646, 654, 718, 725, 749, 756, 778, 882, 890]
我写了一个 this challenge based on this answer 的解决方案。它成功处理了给出的示例案例,但不是实际案例。
挑战:
Given two strings
s
andt
,t
is a substring ofs
ift
is contained as a contiguous collection of symbols ins
(as a result,t
must be no longer thans
).The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18). The symbol at position
i
ofs
is denoted bys[i]
.A substring of
s
can be represented ass[j:k]
, wherej
andk
represent the starting and ending positions of the substring ins
; for example, ifs = "AUGCUUCAGAAAGGUCUUACG"
, thens[2:5] = "UGCU"
.The location of a substring
s[j:k]
is its beginning positionj
; note thatt
will have multiple locations ins
if it occurs more than once as a substring ofs
(see the Sample below).
给定:
Two DNA strings
s
andt
(each of length at most 1 kbp).
Return:
All locations of
t
as a substring ofs
.
示例数据集:
GATATATGCATATACTT
ATAT
示例输出:
2 4 10
对于示例,它有效。当然,您需要手动 trim 格式化,但这只是几秒钟的工作。
不接受实际数据和我生成的输出。
实际数据集:
CAAATAGTCACACAATAGTCGGCTAAATAGTCAATAGTCAAATAGTCAGAGCTAATAGTCTAAATAGTCGAAAAATAGTCATCAATAGTCTAAATAGTCAATAGTCGGAATAGTCAAAATAGTCAATAGTCAATAGTCAATAGTCGACTAAATAGTCCCAATAGTCTCAGAAATAGTCAATAGTCGTAATAGTCAATAGTCTAATAGTCTAATAGTCCAATAGTCTGTCAAATAGTCAATAGTCCAATAGTCGTTTAATAGTCCCCTTTACCAATAGTCAATAGTCCGAATAGTCAGGAATAGTCAGCACTAATAGTCAATAGTCCTAATAGTCCCAATAGTCAAAATAGTCAATAGTCTAAATAATAGTCCTAGCAGAAGAATAGTCTAATAGTCGGCAATAGTCAATAGTCAAATAGTCAGAATAGTCAAATAGTCGAAATAGTCAATAGTCAATAGTCAAATAGTCAAATAGTCAATAGTCAAATAGTCAAATAGTCAAATAGTCGAATAGTCTGTAATAGTCAATAGTCCTTCAATAGTCTAATAGTCATTCAATAGTCAAGAAATAGTCGGGGGAATAGTCCGAATAGTCAAATAGTCAATAGTCGAATAGTCTAATAGTCAATAGTCTAATAGTCTGATAATAGTCAAATAGTCAATAGTCTAAATAGTCGCCTATGCCAATAGTCTTATCAAATAGTCTCTTAATAGTCTAATAGTCAATAGTCAATAGTCTAATAGTCATAATAGTCAATAGTCAAGGAATAGTCCCATAATAGTCAATAGTCTTAATAGTCCAAACGAAATAGTCTTAATAGTCCCTAATAGTCACTAATAGTCGTAATAGTCATAATAGTCCAATAGTCTAAATAGTCTGCAATAGTCAAATAGTCAAATAGTCCGTACAATAGTCTTAATAGTCTTTGCGGCTCAATAGTCTCATAATAGTC
AATAGTCAA
实际输出(trimmed):
26 33 93 109 118 125 132
代码:
def find_substring_locations(long, short)
mpos = []
re = Regexp.new(short)
m = i = 0
m = re.match( long, i ) { |k| j = k.begin(0); i = j + 1; mpos << j } while m
return mpos
end
def plus_one(input)
arr = []
for i in input
arr << (i += 1)
end
arr
end
main_string = gets.chomp
sub_string = gets.chomp
plus_one(find_substring_locations(main_string, sub_string))
我哪里做错了?它似乎是有序的。
编辑: 原来问题出在环境中。重启后问题无法重现。
我认为你的代码是正确的,除了一个错误。 plus_one
解决方案可以简化。
m = re.match( long, i ) { |k| j = k.begin(0); i = j + 1; mpos << j + 1} while m
但是有一种更容易实现的方法。 您不需要正则表达式,有一种更简单的方法来搜索子字符串匹配的索引:
String#index
接受一个附加参数,起始索引。
input = "CAAATAGTCACACAATAGTCGGCTAAATAGTCAATAGTCAAATAGTCAGAGCTAATAGTCTAAATAGTCGAAAAATAGTCATCAATAGTCTAAATAGTCAATAGTCGGAATAGTCAAAATAGTCAATAGTCAATAGTCAATAGTCGACTAAATAGTCCCAATAGTCTCAGAAATAGTCAATAGTCGTAATAGTCAATAGTCTAATAGTCTAATAGTCCAATAGTCTGTCAAATAGTCAATAGTCCAATAGTCGTTTAATAGTCCCCTTTACCAATAGTCAATAGTCCGAATAGTCAGGAATAGTCAGCACTAATAGTCAATAGTCCTAATAGTCCCAATAGTCAAAATAGTCAATAGTCTAAATAATAGTCCTAGCAGAAGAATAGTCTAATAGTCGGCAATAGTCAATAGTCAAATAGTCAGAATAGTCAAATAGTCGAAATAGTCAATAGTCAATAGTCAAATAGTCAAATAGTCAATAGTCAAATAGTCAAATAGTCAAATAGTCGAATAGTCTGTAATAGTCAATAGTCCTTCAATAGTCTAATAGTCATTCAATAGTCAAGAAATAGTCGGGGGAATAGTCCGAATAGTCAAATAGTCAATAGTCGAATAGTCTAATAGTCAATAGTCTAATAGTCTGATAATAGTCAAATAGTCAATAGTCTAAATAGTCGCCTATGCCAATAGTCTTATCAAATAGTCTCTTAATAGTCTAATAGTCAATAGTCAATAGTCTAATAGTCATAATAGTCAATAGTCAAGGAATAGTCCCATAATAGTCAATAGTCTTAATAGTCCAAACGAAATAGTCTTAATAGTCCCTAATAGTCACTAATAGTCGTAATAGTCATAATAGTCCAATAGTCTAAATAGTCTGCAATAGTCAAATAGTCAAATAGTCCGTACAATAGTCTTAATAGTCTTTGCGGCTCAATAGTCTCATAATAGTC"
query = "AATAGTCAA"
def substring_index(input, query)
last = 0
matches = []
while index = input.index(query, last) do
matches << index += 1
last = index
end
matches
end
p substring_index(input, query)
# => [26, 33, 93, 109, 118, 125, 132, 172, 188, 231, 273, 312, 337, 346, 400, 407, 424, 441, 448, 455, 463, 471, 478, 486, 494, 520, 557, 589, 597, 620, 646, 654, 718, 725, 749, 756, 778, 882, 890]
实际上不是答案,但您的代码按预期工作:
s = 'CAAATAGTCACACAATAGTCGGCTAAATAGTCAATAGTCAAATAGTCAGAGCTAATAGTCTAAA'\
'TAGTCGAAAAATAGTCATCAATAGTCTAAATAGTCAATAGTCGGAATAGTCAAAATAGTCAATA'\
'GTCAATAGTCAATAGTCGACTAAATAGTCCCAATAGTCTCAGAAATAGTCAATAGTCGTAATAG'\
'TCAATAGTCTAATAGTCTAATAGTCCAATAGTCTGTCAAATAGTCAATAGTCCAATAGTCGTTT'\
'AATAGTCCCCTTTACCAATAGTCAATAGTCCGAATAGTCAGGAATAGTCAGCACTAATAGTCAA'\
'TAGTCCTAATAGTCCCAATAGTCAAAATAGTCAATAGTCTAAATAATAGTCCTAGCAGAAGAAT'\
'AGTCTAATAGTCGGCAATAGTCAATAGTCAAATAGTCAGAATAGTCAAATAGTCGAAATAGTCA'\
'ATAGTCAATAGTCAAATAGTCAAATAGTCAATAGTCAAATAGTCAAATAGTCAAATAGTCGAAT'\
'AGTCTGTAATAGTCAATAGTCCTTCAATAGTCTAATAGTCATTCAATAGTCAAGAAATAGTCGG'\
'GGGAATAGTCCGAATAGTCAAATAGTCAATAGTCGAATAGTCTAATAGTCAATAGTCTAATAGT'\
'CTGATAATAGTCAAATAGTCAATAGTCTAAATAGTCGCCTATGCCAATAGTCTTATCAAATAGT'\
'CTCTTAATAGTCTAATAGTCAATAGTCAATAGTCTAATAGTCATAATAGTCAATAGTCAAGGAA'\
'TAGTCCCATAATAGTCAATAGTCTTAATAGTCCAAACGAAATAGTCTTAATAGTCCCTAATAGT'\
'CACTAATAGTCGTAATAGTCATAATAGTCCAATAGTCTAAATAGTCTGCAATAGTCAAATAGTC'\
'AAATAGTCCGTACAATAGTCTTAATAGTCTTTGCGGCTCAATAGTCTCATAATAGTC'
t = 'AATAGTCAA'
plus_one(find_substring_locations(s, t))
#=> [26, 33, 93, 109, 118, 125, 132, 172, 188, 231, 273, 312, 337, 346,
# 400, 407, 424, 441, 448, 455, 463, 471, 478, 486, 494, 520, 557,
# 589, 597, 620, 646, 654, 718, 725, 749, 756, 778, 882, 890]