使用 pandoc 作为库时,什么可能导致 "commitAndReleaseBuffer: invalid argument (invalid character)"?
What might cause "commitAndReleaseBuffer: invalid argument (invalid character)" when using pandoc as a library?
我正在使用 pandoc 作为一个库,相关的代码片段是:
module Lib
( latexDirToTex, latexToTxt
) where
import qualified Data.ByteString as BS
import Data.List (isSuffixOf)
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import ForeignLib (chdir)
import Path
import System.Directory (getDirectoryContents )
import Text.Pandoc
import Text.Pandoc.UTF8 (toText)
latexToTxt :: Path b File -> IO T.Text
latexToTxt fPath = do
fileBS <- BS.readFile $ toFilePath fPath
result <- runIO $ do
doc <- readLaTeX def $ toText fileBS
writePlain def doc
handleError result
由此可见,我基本上只是调用 readLaTeX
来读取 LaTeX 文档。
然而,当我尝试 运行 这段代码时,我在实践中遇到了很多麻烦,出现了如标题中的错误:
[WARNING] Could not convert TeX math '\begin{array}{ccccccccccc}
& & 1 & 2 & 4 & 7 & 11 & 15 & 15 & & \
\hline
0 & \vline & 1 & 0 & 0 & 0 & 0 & 0 & 0 & \vline & 1 \
1 & \vline & 1 & 1 & 0 & 0 & 0 & 0 & 0 & \vline & 3 \
2 & \vline & 1 & 2 & 1 & 0 & 0 & 0 & 0 & \vline & 9 \
3 & \vline & 1 & 3 & 3 & 1 & 0 & 0 & 0 & \vline & 26 \
4 & \vline & 1 & 4 & 6 & 4 & 1 & 0 & 0 & \vline & 72 \
5 & \vline & 1 & 5 & 10 & 10 & 5 & 1 & 0 & \vline & 191 \
6 & \vline & 0 & 6 & 15 & 20 & 15 & 6 & 1 & \vline & 482 \
7 & \vline & 0 & 0 & 21 & 35 & 35 & 21 & 7 & \vline & 1134 \
8 & \vline & 0 & 0 & 0 & 56 & 70 & 56 & 28 & \vline & 2422 \
9 & \vline & 0 & 0 & 0 & 0 & 126 & 126 & 34 & \vline & 4536 \
10 & \vline & 0 & 0 & 0 & 0 & 0 & 252 & 210 & \vline & 6930 \
11 & \vline & 0 & 0 & 0 & 0 & 0 & 0 & 462 & \vline & 6930
\end{array}', rendering as TeX:
0 & \vline & 1 & 0 & 0 & 0 & 0 & 0 &
^
unexpected "\"
expecting "&", "\\", white space or "\end"
arxiv-pandoc-static: <stdout>: commitAndReleaseBuffer: invalid argument (invalid character)
将此与直接使用 pandoc 可执行文件进行对比,没有发生此类错误,并且我收到了非常好的输出。我想将 pandoc 阅读器配置为尽可能灵活,并且不要因错误而放弃(或者更好的是,首先避免错误)。如何通过 pandoc API?
实现此目的
我相信这不是 pandoc 问题,而是 GHC 或 text 包的问题。答案可以在一个完全不相关的 Haskell 项目中找到,hledger docs:
Getting errors like "Illegal byte sequence" or "Invalid or
incomplete multibyte or wide character" or "commitAndReleaseBuffer:
invalid argument (invalid character)"
Programs compiled with GHC (hledger, haskell build tools, etc.) need to
have a UTF-8-aware locale configured in the environment, otherwise they
will fail with these kinds of errors when they encounter non-ascii
characters.
To fix it, set the LANG environment variable to some locale which
supports UTF-8. The locale you choose must be installed on your system.
所以 运行 你 shell 中的 export LANG=C.UTF-8
应该可以解决这个问题。
我正在使用 pandoc 作为一个库,相关的代码片段是:
module Lib
( latexDirToTex, latexToTxt
) where
import qualified Data.ByteString as BS
import Data.List (isSuffixOf)
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import ForeignLib (chdir)
import Path
import System.Directory (getDirectoryContents )
import Text.Pandoc
import Text.Pandoc.UTF8 (toText)
latexToTxt :: Path b File -> IO T.Text
latexToTxt fPath = do
fileBS <- BS.readFile $ toFilePath fPath
result <- runIO $ do
doc <- readLaTeX def $ toText fileBS
writePlain def doc
handleError result
由此可见,我基本上只是调用 readLaTeX
来读取 LaTeX 文档。
然而,当我尝试 运行 这段代码时,我在实践中遇到了很多麻烦,出现了如标题中的错误:
[WARNING] Could not convert TeX math '\begin{array}{ccccccccccc}
& & 1 & 2 & 4 & 7 & 11 & 15 & 15 & & \
\hline
0 & \vline & 1 & 0 & 0 & 0 & 0 & 0 & 0 & \vline & 1 \
1 & \vline & 1 & 1 & 0 & 0 & 0 & 0 & 0 & \vline & 3 \
2 & \vline & 1 & 2 & 1 & 0 & 0 & 0 & 0 & \vline & 9 \
3 & \vline & 1 & 3 & 3 & 1 & 0 & 0 & 0 & \vline & 26 \
4 & \vline & 1 & 4 & 6 & 4 & 1 & 0 & 0 & \vline & 72 \
5 & \vline & 1 & 5 & 10 & 10 & 5 & 1 & 0 & \vline & 191 \
6 & \vline & 0 & 6 & 15 & 20 & 15 & 6 & 1 & \vline & 482 \
7 & \vline & 0 & 0 & 21 & 35 & 35 & 21 & 7 & \vline & 1134 \
8 & \vline & 0 & 0 & 0 & 56 & 70 & 56 & 28 & \vline & 2422 \
9 & \vline & 0 & 0 & 0 & 0 & 126 & 126 & 34 & \vline & 4536 \
10 & \vline & 0 & 0 & 0 & 0 & 0 & 252 & 210 & \vline & 6930 \
11 & \vline & 0 & 0 & 0 & 0 & 0 & 0 & 462 & \vline & 6930
\end{array}', rendering as TeX:
0 & \vline & 1 & 0 & 0 & 0 & 0 & 0 &
^
unexpected "\"
expecting "&", "\\", white space or "\end"
arxiv-pandoc-static: <stdout>: commitAndReleaseBuffer: invalid argument (invalid character)
将此与直接使用 pandoc 可执行文件进行对比,没有发生此类错误,并且我收到了非常好的输出。我想将 pandoc 阅读器配置为尽可能灵活,并且不要因错误而放弃(或者更好的是,首先避免错误)。如何通过 pandoc API?
实现此目的我相信这不是 pandoc 问题,而是 GHC 或 text 包的问题。答案可以在一个完全不相关的 Haskell 项目中找到,hledger docs:
Getting errors like "Illegal byte sequence" or "Invalid or incomplete multibyte or wide character" or "commitAndReleaseBuffer: invalid argument (invalid character)"
Programs compiled with GHC (hledger, haskell build tools, etc.) need to have a UTF-8-aware locale configured in the environment, otherwise they will fail with these kinds of errors when they encounter non-ascii characters.
To fix it, set the LANG environment variable to some locale which supports UTF-8. The locale you choose must be installed on your system.
所以 运行 你 shell 中的 export LANG=C.UTF-8
应该可以解决这个问题。