递归搜索目录以查找 Haskell 中匹配名称条件的所有文件

Question

我在Haskell方面比较缺乏经验，想提高，所以对于我的一个学习项目，我有以下要求：

我想从指定的顶级目录开始搜索，不一定是绝对路径。
我想查找给定扩展名的所有文件，比如 .md。
我不想搜索隐藏目录，比如 toplevel/.excluded。
我希望能够忽略像 gedit 生成的隐藏文件 .filename.md.swp。
我希望得到一个完整的文件列表作为我的函数的结果。

我搜遍了SO。这是我目前所拥有的：

import qualified System.FilePath.Find as SFF
import qualified Filesystem.Path.CurrentOS as FP

srcFolderName = "src"
outFolderName = "output"
resFolderName = "res"

ffNotHidden :: SFF.FindClause Bool
ffNotHidden = SFF.fileName SFF./~? ".?*"

ffIsMD :: SFF.FindClause Bool
ffIsMD = SFF.extension SFF.==? ".md" SFF.&&? SFF.fileName SFF./~? ".?*"

findMarkdownSources :: FilePath -> IO [FilePath]
findMarkdownSources filePath = do
    paths <- SFF.find ffNotHidden ffIsMD filePath
    return paths

这行不通。 "findMarkdownSources" 中的 printf 样式调试，我可以验证 filePath 是否正确，例如"/home/user/testdata"（打印包括“，以防告诉你一些事情）。列表paths总是空的。我绝对确定我指定的目录中有降价文件（找到/path/to/dir -name "*.md" 找到它们）。

因此我有一些具体的问题。

是否有原因（过滤器不正确），例如为什么这段代码不起作用？
在 haskell 中有多种方法可以做到这一点。似乎至少有六个软件包（fileman、system.directory、system.filepath.find）专用于此。以下是一些回答类似问题的问题：
1. Streaming recursive descent of a directory in Haskell
2. Is there some directory walker in Haskell?
每个人都有大约三种独特的方法来实现我想要实现的目标，因此，我们几乎有 10 种方法可以做到...
有没有具体的方法可以做到这一点？如果是，为什么？如果有帮助，一旦我有了文件列表，我将遍历整个过程，打开并解析每个文件。

如果有帮助，我对基本的 haskell 相当满意，但如果我们开始对 monad 和应用仿函数（我不使用 haskell 足以让它留在我的脑海中）。不过，我发现 haskell 有关黑客攻击的文档难以理解。

Answer 1

so, we're nearly at 10 ways to do it...

还有另一种方法，使用 directory, filepath and extra 包中的函数，但不要使用太多 monad 魔法：

import Control.Monad (foldM)
import System.Directory (doesDirectoryExist, listDirectory) -- from "directory"
import System.FilePath ((</>), FilePath) -- from "filepath"
import Control.Monad.Extra (partitionM) -- from the "extra" package

traverseDir :: (FilePath -> Bool) -> (b -> FilePath -> IO b) -> b -> FilePath -> IO b
traverseDir validDir transition =
    let go state dirPath =
            do names <- listDirectory dirPath
               let paths = map (dirPath </>) names
               (dirPaths, filePaths) <- partitionM doesDirectoryExist paths
               state' <- foldM transition state filePaths -- process current dir
               foldM go state' (filter validDir dirPaths) -- process subdirs
     in go

思路是用户通过一个FilePath -> Bool函数来过滤不需要的目录；还有一个初始状态 b 和一个处理文件名、更新 b 状态并可能有一些副作用的转换函数 b -> FilePath -> IO b。请注意，状态的类型由调用者选择，调用者可能会将有用的东西放在那里。

如果我们只想打印生成的文件名，我们可以这样做：

traverseDir (\_ -> True) (\() path -> print path) () "/tmp/somedir"

我们使用 () 作为虚拟状态，因为我们在这里并不真正需要它。

如果我们想把文件累加成一个列表，可以这样：

traverseDir (\_ -> True) (\fs f -> pure (f : fs)) [] "/tmp/somedir"

如果我们想过滤一些文件怎么办？我们需要调整传递给 traverseDir 的转换函数，以便它忽略它们。

Answer 2

我在我的机器上测试了你的代码，它似乎工作正常。这是一些示例数据：

$ find test/data
test/data
test/data/look-a-md-file.md
test/data/another-dir
test/data/another-dir/shown.md
test/data/.not-shown.md
test/data/also-not-shown.md.bkp
test/data/.hidden
test/data/some-dir
test/data/some-dir/shown.md
test/data/some-dir/.ahother-hidden
test/data/some-dir/.ahother-hidden/im-hidden.md

运行您的函数将导致：

ghci> findMarkdownSources "test"
["test/data/another-dir/shown.md","test/data/look-a-md-file.md","test/data/some-dir/shown.md"]

我用绝对路径测试过，也可以。您确定已通过有效路径吗？如果是这种情况，您将得到一个空列表（尽管您也会收到警告）。

请注意，您的代码可以简化如下：

module Traversals.FileManip where

import           Data.List            (isPrefixOf)
import           System.FilePath.Find (always, extension, fileName, find, (&&?),
                                       (/~?), (==?))

findMdSources :: FilePath -> IO [FilePath]
findMdSources fp = find isVisible (isMdFile &&? isVisible) fp
    where
      isMdFile = extension ==? ".md"
      isVisible = fileName /~? ".?*"

您甚至可以删除 fp 参数，但为了清楚起见，我将其保留在这里。

我更喜欢显式导入，以便我知道每个函数的来源（因为我不知道任何具有高级符号导航的 Haskell IDE）。

但是，请注意，此解决方案使用不安全的交错 IO，is not recommended。

所以关于你的问题 2 和 3，我会推荐一个流式解决方案，比如 pipes or conduits. Sticking to these kind of solutions will reduce your options (just like sticking to pure functional programming languages reduced my options for programming languages ;)). Here 你有一个关于如何使用管道遍历目录的例子。

Here 是您想要尝试的代码。

递归搜索目录以查找 Haskell 中匹配名称条件的所有文件

Recursively search directories for all files matching name criteria in Haskell

file-extension

haskell

filepath