Haskell 中的频率计数
Frequency Count in Haskell
所以我要完成的任务是从字符串 s 中获取长度为 n 的所有可能模式的频率列表。
Input (Text,Length)
Output (Frequency)
String -> Int -> [Int]
freqCount s n = frequency list [int] in alphabetical order
["A","B","C","D"]
其中 s 中的字符仅限于上述四个字符,因此我认为第一步是获取长度为 n 的所有可能的排列,并允许重复。
permutationsR k = sort(replicateM k ['A','B','C','D'])
例如。
permutationsR 2
会给出输出
["AA","AB","AC","AD","BA","BB","BC","BD","CA","CB","CC","CD","DA","DB","DC","DD"]
然后为每个模式计算每个模式出现的次数
所以像
patternCount:: String -> String -> Int
patternCount text pattern = length (filter (\x -> x == pattern) [take (length(pattern)) (drop x text)| x <- [0..length(text)- length(pattern)]])
frequencyCount s n = map (\x -> patternCount s x) (permutationsR n)
但是我认为这将是非常低效的,因为我实际上是遍历整个列表来检查每个模式长度 (permutationsR n) 次,而不是我认为应该只在一次迭代中就可以做的事情。
有没有办法像我在命令式语言中那样生成频率图。
即在伪代码中
where s = string
and n = length of pattern
//pattern is a map where key = pattern and value = frequencyCount
patterns = {"AA":0,"AB":0,"AC:0...}
for (i = 0; i < (length s - n); i++){
patterns[s[i:(i+n)]] += 1
}
基本上只是迭代一次,将字符串从 (i:i+n) 中拆分出来
并在每次出现时更新模式图。
示例输入,输出是这样的
s= "AABBCA"
n = 2
frequencyList s n = [1,1,0,0,0,1,1,0,1,0,0,0,0,0,0,0]
这是一个可能的解决方案:
import Data.List
groupsOf n str =
unfoldr (\s ->
if length s < n
then Nothing
else Just (take n s, tail s)) str
frequency :: (Ord t, Eq t) => [t] -> [(t, Int)]
frequency =
map (\s -> (head s, length s)) . group . sort
groupsOf
将输入字符串拆分为长度为 n 的重叠序列。例如,
groupsOf 3 "AABCABC"
会给
["AAB", "ABC", "BCA", "CAB", "ABC"].
frequency
然后会统计每个子序列出现的次数,所以
frequency $ groupsOf 3 "AABCABC"
应该给
[("AAB", 1), ("ABC", 2), ("BCA", 1), ("CAB", 1)].
任何未出现在结果中的子序列出现零次。
所以我要完成的任务是从字符串 s 中获取长度为 n 的所有可能模式的频率列表。
Input (Text,Length)
Output (Frequency)
String -> Int -> [Int]
freqCount s n = frequency list [int] in alphabetical order
["A","B","C","D"]
其中 s 中的字符仅限于上述四个字符,因此我认为第一步是获取长度为 n 的所有可能的排列,并允许重复。
permutationsR k = sort(replicateM k ['A','B','C','D'])
例如。
permutationsR 2
会给出输出
["AA","AB","AC","AD","BA","BB","BC","BD","CA","CB","CC","CD","DA","DB","DC","DD"]
然后为每个模式计算每个模式出现的次数 所以像
patternCount:: String -> String -> Int
patternCount text pattern = length (filter (\x -> x == pattern) [take (length(pattern)) (drop x text)| x <- [0..length(text)- length(pattern)]])
frequencyCount s n = map (\x -> patternCount s x) (permutationsR n)
但是我认为这将是非常低效的,因为我实际上是遍历整个列表来检查每个模式长度 (permutationsR n) 次,而不是我认为应该只在一次迭代中就可以做的事情。
有没有办法像我在命令式语言中那样生成频率图。
即在伪代码中
where s = string
and n = length of pattern
//pattern is a map where key = pattern and value = frequencyCount
patterns = {"AA":0,"AB":0,"AC:0...}
for (i = 0; i < (length s - n); i++){
patterns[s[i:(i+n)]] += 1
}
基本上只是迭代一次,将字符串从 (i:i+n) 中拆分出来 并在每次出现时更新模式图。
示例输入,输出是这样的
s= "AABBCA"
n = 2
frequencyList s n = [1,1,0,0,0,1,1,0,1,0,0,0,0,0,0,0]
这是一个可能的解决方案:
import Data.List
groupsOf n str =
unfoldr (\s ->
if length s < n
then Nothing
else Just (take n s, tail s)) str
frequency :: (Ord t, Eq t) => [t] -> [(t, Int)]
frequency =
map (\s -> (head s, length s)) . group . sort
groupsOf
将输入字符串拆分为长度为 n 的重叠序列。例如,
groupsOf 3 "AABCABC"
会给
["AAB", "ABC", "BCA", "CAB", "ABC"].
frequency
然后会统计每个子序列出现的次数,所以
frequency $ groupsOf 3 "AABCABC"
应该给
[("AAB", 1), ("ABC", 2), ("BCA", 1), ("CAB", 1)].
任何未出现在结果中的子序列出现零次。