从 Haskell 中的文本文件读取后如何将元组添加到列表

Question

我正在尝试在 Haskell 中创建一个程序，该程序从文本文件中读取文本并将它们添加到列表中。

我的想法是：

type x = [(String, Integer)]

其中字符串是文本中的每个单词，整数是该单词在文本中出现的次数。所以我想创建一个包含这些值的元组并将其添加到列表中。然后我想打印列表的内容。

我知道如何阅读 Haskell 中的文本文件，但不确定下一步该怎么做。我是 Haskell 中的编程新手，主要是在 Java 中编程，这是非常不同的。

编辑：

这就是我到目前为止的建议。我能够使用从文件接收到的文本写入输出文本文件并将其设为小写。我遇到的问题是使用其他功能，因为它说：

Test.hs:14:59: Not in scope: ‘group’

代码如下：

import System.IO  
import Data.Char(toLower)

main = do  
       contents <- readFile "testFile.txt"
       let lowContents = map toLower contents
       let outStr = countWords (lowContents)
       let finalStr = sortOccurrences (outStr)
       print outStr

-- Counts all the words
countWords :: String -> [(String, Int)]
countWords fileContents = countOccurrences (toWords fileContents)

-- Split words
toWords :: String -> [String]
toWords s = words s

-- Counts, how often each string in the given list appears
countOccurrences :: [String] -> [(String, Int)]
countOccurrences xs = map (\xs -> (head xs, length xs)) . group . sortOccurrences xs

-- Sort list in order of occurrences.
sortOccurrences :: [(String, Int)] -> [(String, Int)]
sortOccurrences sort = sortBy sort (comparing snd)

谁能帮我解决这个问题。

Answer 1

Haskell 具有相当富有表现力的类型系统（比 Java 更丰富），因此最好以自上而下的方式纯粹从类型的角度来考虑这个问题。您提到您已经知道如何读取 Haskell 中的文本文件，所以我假设您知道如何获取包含文件内容的 String。

您要定义的函数是这样的。现在，我们将定义设置为 undefined 以便代码类型检查（但在运行时产生异常）：

countWords :: String -> [(String, Int)]
countWords fileContents = undefined

您的函数将 String（文件内容）映射到元组列表，每个元组将某个词与该词在输入中出现的频率相关联。这听起来像是解决方案的一部分将是一个函数，它可以将一个字符串拆分成一个单词列表，这样您就可以处理它来计算单词的数量。 IE。你会想要这样的东西：

-- Splits a string into a list of words
toWords :: String -> [String]
toWords s = undefined

-- Counts, how often each string in the given list appears
countOccurrences :: [String] -> [(String, Int)]
countOccurrences xs = undefined

有了这些，你就可以实际定义原来的函数了：

countWords :: String -> [(String, Int)]
countWords fileContents = countOccurrences (toWords fileContents)

您现在很好地将问题分解为两个子问题。

这个类型驱动程序的另一个好处是 Hoogle 可以被告知去寻找给定类型的函数。例如，考虑我们之前描述的 toWords 函数的类型：

toWords :: String -> [String]
toWords s = undefined

Feeding this to Hoogle reveals a nice function: words 这似乎正是我们想要的！所以我们可以定义

toWords :: String -> [String]
toWords s = words s

唯一缺少的是为 countOccurrences 提出一个适当的定义。唉，searching for this type on Hoogle doesn't show any ready-made solutions. However, there are three functions which will be useful for coming up with our own definition: sort, group and map:

sort 函数的作用，顾名思义：它对事物列表进行排序：
```
λ: sort [1,1,1,2,2,1,1,3,3]
[1,1,1,1,1,2,2,3,3]
```
group 函数对连续（！）相等的元素进行分组，生成一个列表列表。例如
```
λ: group [1,1,1,1,1,2,2,3,3]
[[1,1,1,1,1],[2,2],[3,3]]
```
map函数可用于将group生成的列表列表转换为元组列表，给出每个组的长度：
```
λ: map (\xs -> (head xs, length xs)) [[1,1,1,1,1],[2,2],[3,3]]
[(1,5),(2,2),(3,2)]
```

组合这三个函数可以定义

countOccurrences :: [String] -> [(String, Int)]
countOccurrences xs = map (\xs -> (head xs, length xs)) . group . sort $ xs

现在你已经准备好了所有的部分。您的 countWords 是根据 toWords 和 countOccurrences 定义的，每个都有一个正确的定义。

这种类型驱动方法的好处在于，写下函数签名将有助于您的思考和编译器（当您违反假设时抓住您）。您还可以自动将问题分解为更小的问题，每个问题都可以在 ghci.

中独立测试

Answer 2

Data.Map 是最简单的方法。

import qualified Data.Map as M

-- assuming you already have your list of words:
listOfWords :: [String]

-- you can generate your list of tuples with this
listOfTuples :: [(String, Integer)]
listOfTuples = M.toList . M.fromListWith (+) $ zip listOfWords (repeat 1)

从 Haskell 中的文本文件读取后如何将元组添加到列表

How to add tuples to list after reading from a text file in Haskell

haskell

tuples

list

readfile