Convert Attoparsec parser to parse from another string type

190 views Asked by At

Is there some "easy" way (e.g. something I am missing in Attoparsec or some other library) to convert a defined Attoparsec parser that parses from ByteString to the one that parses from Text?

For example I have:

import Data.Attoparsec.ByteString.Char8
myTypeByteStringParser :: Parser MyType

What's the way to transform it into:

import Data.Attoparsec.Text
myTypeTextParser :: Parser MyType

It does look like contramap (from hoogling type signature) but it is probably not possible to define Contravariant for Parser?

2

There are 2 answers

2
liyang On BEST ANSWER

This is possible in general and you don't need to fork attoparsec. Inconsiderately attoparsec doesn't expose enough of its internals, but don't let that stop us:

{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE QuasiQuotes #-}

module Parsers where

import qualified Data.Attoparsec.ByteString as AB
import qualified Data.Attoparsec.Internal.Types as AIT
import qualified Data.Attoparsec.Text as AT
import Data.ByteString (ByteString)
import qualified Data.ByteString.Internal as BI
import Data.Text (Text)
import Data.Text.Encoding (decodeUtf8, encodeUtf8)
import qualified Data.Text.Internal as TI
import Unsafe.TrueName

bsToTextState :: AIT.State ByteString -> AIT.State Text
bsToTextState = bufferText . decodeUtf8 . unbufferBS where
    unbufferBS :: AIT.State ByteString -> ByteString
    unbufferBS [truename| ''AIT.State
        Data.Attoparsec.ByteString.Buffer.Buffer
        Buf | fp off len _ _ |] = BI.PS fp off len
    bufferText :: Text -> AIT.State Text
    bufferText (TI.Text arr off len) = [truename| ''AIT.State
        Data.Attoparsec.Text.Buffer.Buffer
        Buf |] arr off len len 0

textToBSState :: AIT.State Text -> AIT.State ByteString
textToBSState = bufferBS . encodeUtf8 . unbufferText where
    unbufferText :: AIT.State Text -> Text
    unbufferText [truename| ''AIT.State
        Data.Attoparsec.Text.Buffer.Buffer
        Buf | arr off len _ _ |] = TI.Text arr off len
    bufferBS :: ByteString -> AIT.State ByteString
    bufferBS (BI.PS fp off len) = [truename| ''AIT.State
        Data.Attoparsec.ByteString.Buffer.Buffer
        Buf |] fp off len len 0

mapIResult :: (i -> j) -> (j -> i) -> AIT.IResult i a -> AIT.IResult j a
mapIResult f g = go where
    go = \case
        AIT.Fail i ctx msg -> AIT.Fail (f i) ctx msg
        AIT.Partial k -> AIT.Partial (go . k . g)
        AIT.Done i r -> AIT.Done (f i) r

mapFailure :: (i -> j) -> (j -> i) -> (AIT.State j -> AIT.State i) ->
    AIT.Failure i (AIT.State i) r -> AIT.Failure j (AIT.State j) r
mapFailure f g h k st p m ctx msg = mapIResult f g $ k (h st) p m ctx msg

mapSuccess :: (i -> j) -> (j -> i) -> (AIT.State j -> AIT.State i) ->
    AIT.Success i (AIT.State i) a r -> AIT.Success j (AIT.State j) a r
mapSuccess f g h k st p m a = mapIResult f g $ k (h st) p m a

bsToTextParser :: AB.Parser a -> AT.Parser a
bsToTextParser (AIT.Parser bsP) = AIT.Parser textP where
    textP st p m f s = mapIResult decodeUtf8 encodeUtf8 $ bsP
        (textToBSState st) p m
        (mapFailure encodeUtf8 decodeUtf8 bsToTextState f)
        (mapSuccess encodeUtf8 decodeUtf8 bsToTextState s)

textToBSParser :: AT.Parser a -> AB.Parser a
textToBSParser (AIT.Parser textP) = AIT.Parser bsP where
    bsP st p m f s = mapIResult encodeUtf8 decodeUtf8 $ textP
        (bsToTextState st) p m
        (mapFailure decodeUtf8 encodeUtf8 textToBSState f)
        (mapSuccess decodeUtf8 encodeUtf8 textToBSState s)

{,un}buffer{BS,Text} are adapted from the respective internal modules Data.Attoparsec.{ByteString,Text}.Buffer.

Was a good excuse for me to update true-name to work with more recent GHC though. Depending on how up-to-date you are, you may need the WIP from GitHub.

It's probably not terrible for performance, as long as you keep in mind that each time you use textToBSParser, the entire input gets fed through encodeUtf8 with any leftover converted back via decodeUtf8, and vice versa for bsToTextParser. If you only convert a Parser once at the top-level, it shouldn't be too different from simply converting the input as the other answer suggests.

PS: I haven't tested this beyond

$ ghci -XOverloadedStrings parsers.hs 
*Parsers> textToBSParser AT.scientific `AB.parseTest` "123 "
Done " " 123.0

PPS: for your own parsers, you might be able to leverage OverloadedStrings and write p :: IsString s => AIT.Parser s a instead, with {-# SPECIALISE p :: AT.Parser a #-} pragmas. I've not explored how workable this idea is.

2
DDub On

I'm not sure this is possible in general. The Parser type defined in Attoparsec doesn't look like it plays nicely with modifying the input type. So, if you want to combine a Text parser with a ByteString parser, I'm afraid you may be out of luck.

That said, if what you want is to be able to run a ByteString parser on some input Text, you might be able to get around that by first converting the Text input into a ByteString. For instance:

import Data.Text.Encoding
import Data.Attoparsec.ByteString.Char8

-- parse :: Parser a -> ByteString -> Result a 
-- this is given by Attoparsec

parseText :: Parser a -> Text -> Result a
parseText p = parse p . encodeUtf8

Similarly, you can turn a Text parser into a ByteString one by using decodeUtf8 (or a different encoder/decoder as necessary).