使用 pipes-csv 从 csv 文件中读取第一行
Reading first row from a csv file with pipes-csv
我正在使用 pipes-csv 库读取一个 csv 文件。我想先读第一行,然后再读其余的。不幸的是 Pipes.Prelude.head 函数 returns 之后。管道正在以某种方式关闭。有没有办法先读取csv的头部,然后再读取其余部分。
import qualified Data.Vector as V
import Pipes
import qualified Pipes.Prelude as P
import qualified System.IO as IO
import qualified Pipes.ByteString as PB
import qualified Data.Text as Text
import qualified Pipes.Csv as PCsv
import Control.Monad (forever)
showPipe :: Proxy () (Either String (V.Vector Text.Text)) () String IO b
showPipe = forever $ do
x::(Either String (V.Vector Text.Text)) <- await
yield $ show x
main :: IO ()
main = do
IO.withFile "./test.csv"
IO.ReadMode
(\handle -> do
let producer = (PCsv.decode PCsv.NoHeader (PB.fromHandle handle))
headers <- P.head producer
putStrLn "Header"
putStrLn $ show headers
putStrLn $ "Rows"
runEffect ( producer>->
(showPipe) >->
P.stdoutLn)
)
如果我们不首先读取 header,我们可以毫无问题地读取整个 csv:
main :: IO ()
main = do
IO.withFile "./test.csv"
IO.ReadMode
(\handle -> do
let producer = (PCsv.decode PCsv.NoHeader (PB.fromHandle handle))
putStrLn $ "Rows"
runEffect ( producer>->
(showPipe) >->
P.stdoutLn)
)
Pipes.Csv
有 material 用于处理 headers,但我认为这个问题实际上是在寻找 Pipes.await
或 [=15= 的更复杂用法].首先next
:
>>> :t Pipes.next
Pipes.next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r))
next
是考察生产者的基本方式。它有点像列表上的模式匹配。对于列表,两种可能性是 []
和 x:xs
- 这里是 Left ()
和 Right (headers, rows)
。后一对是你要找的。当然需要一个动作(这里是IO
)才能得到它:
main :: IO ()
main = do
handle <- IO.openFile "./test.csv" IO.ReadMode
let producer :: Producer (V.Vector Text.Text) IO ()
producer = PCsv.decode PCsv.NoHeader (PB.fromHandle handle) >-> P.concat
e <- next producer
case e of
Left () -> putStrLn "No lines!"
Right (headers, rows) -> do
putStrLn "Header"
print headers
putStrLn $ "Rows"
runEffect ( rows >-> P.print)
IO.hClose handle
因为 Either
值在这里会让人分心,所以我用 P.concat
消除了 Left
值 - 不解析的行
next
不作用于管道内部,而是直接作用于 Producer
,它将其视为一种具有最终 return 值的 "effectful list"结尾。我们上面得到的特定效果当然可以用 await
来实现,它在管道内起作用。我可以用它来拦截管道中出现的第一个项目,根据它做一些 IO,然后转发剩余的元素:
main :: IO ()
main = do
handle <- IO.openFile "./grades.csv" IO.ReadMode
let producer :: Producer (V.Vector Text.Text) IO ()
producer = PCsv.decode PCsv.NoHeader (PB.fromHandle handle) >-> P.concat
handleHeader :: Pipe (V.Vector Text.Text) (V.Vector Text.Text) IO ()
handleHeader = do
headers <- await -- intercept first value
liftIO $ do -- use it for IO
putStrLn "Header"
print headers
putStrLn $ "Rows"
cat -- pass along all later values
runEffect (producer >-> handleHeader >-> P.print)
IO.hClose handle
不同之处在于,如果producer
为空,我将无法声明它,就像我在前面的程序中对No lines!
所做的那样。
顺便说一下,showPipe
可以定义为 P.map show
,或者简单地定义为 P.show
(但要使用您添加的专用类型。)
我正在使用 pipes-csv 库读取一个 csv 文件。我想先读第一行,然后再读其余的。不幸的是 Pipes.Prelude.head 函数 returns 之后。管道正在以某种方式关闭。有没有办法先读取csv的头部,然后再读取其余部分。
import qualified Data.Vector as V
import Pipes
import qualified Pipes.Prelude as P
import qualified System.IO as IO
import qualified Pipes.ByteString as PB
import qualified Data.Text as Text
import qualified Pipes.Csv as PCsv
import Control.Monad (forever)
showPipe :: Proxy () (Either String (V.Vector Text.Text)) () String IO b
showPipe = forever $ do
x::(Either String (V.Vector Text.Text)) <- await
yield $ show x
main :: IO ()
main = do
IO.withFile "./test.csv"
IO.ReadMode
(\handle -> do
let producer = (PCsv.decode PCsv.NoHeader (PB.fromHandle handle))
headers <- P.head producer
putStrLn "Header"
putStrLn $ show headers
putStrLn $ "Rows"
runEffect ( producer>->
(showPipe) >->
P.stdoutLn)
)
如果我们不首先读取 header,我们可以毫无问题地读取整个 csv:
main :: IO ()
main = do
IO.withFile "./test.csv"
IO.ReadMode
(\handle -> do
let producer = (PCsv.decode PCsv.NoHeader (PB.fromHandle handle))
putStrLn $ "Rows"
runEffect ( producer>->
(showPipe) >->
P.stdoutLn)
)
Pipes.Csv
有 material 用于处理 headers,但我认为这个问题实际上是在寻找 Pipes.await
或 [=15= 的更复杂用法].首先next
:
>>> :t Pipes.next
Pipes.next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r))
next
是考察生产者的基本方式。它有点像列表上的模式匹配。对于列表,两种可能性是 []
和 x:xs
- 这里是 Left ()
和 Right (headers, rows)
。后一对是你要找的。当然需要一个动作(这里是IO
)才能得到它:
main :: IO ()
main = do
handle <- IO.openFile "./test.csv" IO.ReadMode
let producer :: Producer (V.Vector Text.Text) IO ()
producer = PCsv.decode PCsv.NoHeader (PB.fromHandle handle) >-> P.concat
e <- next producer
case e of
Left () -> putStrLn "No lines!"
Right (headers, rows) -> do
putStrLn "Header"
print headers
putStrLn $ "Rows"
runEffect ( rows >-> P.print)
IO.hClose handle
因为 Either
值在这里会让人分心,所以我用 P.concat
Left
值 - 不解析的行
next
不作用于管道内部,而是直接作用于 Producer
,它将其视为一种具有最终 return 值的 "effectful list"结尾。我们上面得到的特定效果当然可以用 await
来实现,它在管道内起作用。我可以用它来拦截管道中出现的第一个项目,根据它做一些 IO,然后转发剩余的元素:
main :: IO ()
main = do
handle <- IO.openFile "./grades.csv" IO.ReadMode
let producer :: Producer (V.Vector Text.Text) IO ()
producer = PCsv.decode PCsv.NoHeader (PB.fromHandle handle) >-> P.concat
handleHeader :: Pipe (V.Vector Text.Text) (V.Vector Text.Text) IO ()
handleHeader = do
headers <- await -- intercept first value
liftIO $ do -- use it for IO
putStrLn "Header"
print headers
putStrLn $ "Rows"
cat -- pass along all later values
runEffect (producer >-> handleHeader >-> P.print)
IO.hClose handle
不同之处在于,如果producer
为空,我将无法声明它,就像我在前面的程序中对No lines!
所做的那样。
顺便说一下,showPipe
可以定义为 P.map show
,或者简单地定义为 P.show
(但要使用您添加的专用类型。)