从较大的字符串 (R) 中随机抽取固定长度的子字符串
Randomly Sample a Fixed Length Substring from a Larger String (R)
我有一个大约1000个字符的长字符串(称之为SuperString),我想从SuperString中随机抽取100个子字符串。
每个子串的长度应为 10 个字符,并且子串中的字符的顺序应与它们在 SuperString 中的顺序相同。
示例:
SuperString = "ADKFKDSLFSDHKENNCNEUNCIEOCIKEMNKSDFU...KJSDLJDFSKLDJSLJ"
substrings = ["FSDHKENNCN", "ADKFKDSLFS", ... ,"OCIKEMNKS"]
# Create a SuperString
set.seed(87)
SuperString = paste(sample(LETTERS, 1000, replace=TRUE), collapse="")
# Function to sample 10 characters in a row, starting at a random point
# in the string
sampleString = function(string) {
nStart = sample(1:991,1)
substr(string, nStart, nStart + 9)
}
# Run the function 100 times
substrings = replicate(100, sampleString(SuperString))
substrings
[1] "VEOUELBFTD" "OPTCIDDNXK" "SFHNKKGOWR" "RVJQYYUSAZ" "MQMBMKCTTI" "ZKLWETGMVR"
[7] "OOXFLGCGPX" "DXAVUMQMBM" "HOORFCFABC" "AMOYPOXXRA" "TGKWKKZUEK" "UYPRPYQCMU"
...
[91] "RZNSLOBFBK" "FKUKMDUQIK" "YGXDXAVUMQ" "SIRAMRBXSH" "TAILZPHZYS" "OEOUTGKWKK"
[97] "XFLGCGPXKZ" "EDRVJQYYUS" "RHUZLBFNQX" "MUWUODCCFT"
我有一个大约1000个字符的长字符串(称之为SuperString),我想从SuperString中随机抽取100个子字符串。
每个子串的长度应为 10 个字符,并且子串中的字符的顺序应与它们在 SuperString 中的顺序相同。
示例:
SuperString = "ADKFKDSLFSDHKENNCNEUNCIEOCIKEMNKSDFU...KJSDLJDFSKLDJSLJ"
substrings = ["FSDHKENNCN", "ADKFKDSLFS", ... ,"OCIKEMNKS"]
# Create a SuperString
set.seed(87)
SuperString = paste(sample(LETTERS, 1000, replace=TRUE), collapse="")
# Function to sample 10 characters in a row, starting at a random point
# in the string
sampleString = function(string) {
nStart = sample(1:991,1)
substr(string, nStart, nStart + 9)
}
# Run the function 100 times
substrings = replicate(100, sampleString(SuperString))
substrings
[1] "VEOUELBFTD" "OPTCIDDNXK" "SFHNKKGOWR" "RVJQYYUSAZ" "MQMBMKCTTI" "ZKLWETGMVR"
[7] "OOXFLGCGPX" "DXAVUMQMBM" "HOORFCFABC" "AMOYPOXXRA" "TGKWKKZUEK" "UYPRPYQCMU"
...
[91] "RZNSLOBFBK" "FKUKMDUQIK" "YGXDXAVUMQ" "SIRAMRBXSH" "TAILZPHZYS" "OEOUTGKWKK"
[97] "XFLGCGPXKZ" "EDRVJQYYUS" "RHUZLBFNQX" "MUWUODCCFT"