What's the simplest yet most elegant way to NOT short-circuit and instead collect errors until their values are used?
What's so hard in accumulating errors? Short circuit only if a function call receives an error value as value. But then return all errors accumulated since.
Insights:
- Monad short-circuits on any error because
>>=relies on there being an argument to apply the function to. - Applicative
<*>can gather up errors from both of its arguments.
The code does not compile (see compiling error below) due to missing Monad instance as I don’t know how it needs to be designed. The code is intended to show the desired behaviour that a Monad instance or any other implementation should provide.
This is a request for a specific code solution (be it Monad instance or an entirely different approach) and NOT for a package recommendation.
Still the approaches used in following language extensions and packages might give some inspiration (parts emerged in #haskell IRC):
- monad-validate
- validation
- these
- ApplicativeDo
- Writer Monad
The following code is inspired by: How is it possible to collect all error messages in the Either Monad? https://blog.ploeh.dk/2018/11/05/applicative-validation/
{-# LANGUAGE DeriveFunctor, RecordWildCards, OverloadedStrings #-}
import Data.Text (Text)
newtype Errors e r = Errors (Either e r) deriving (Show, Eq, Functor)
instance Semigroup m => Applicative (Errors m) where
pure = Errors . pure
Errors (Left x) <*> Errors (Left y) = Errors (Left (x <> y))
Errors f <*> Errors r = Errors (f <*> r)
data Result = Result {r1 :: !Int, rg :: !Int} deriving (Show)
f1 :: Errors [Text] Int
f1 = Errors $ Left ["f1 failed"]
f2 :: Errors [Text] Int
f2 = pure 2
f3 :: Errors [Text] Int
f3 = Errors $ Left ["f3 failed"]
f4 :: Errors [Text] Int
f4 = pure 4
f5 :: Errors [Text] Int
f5 = pure 5
g :: Int -> Int -> Errors [Text] Int
g a b | a + b <= 6 = Errors $ Left ["g: a + b NOT > 6"] -- we'll let `g` fail if sum is less than 6
| otherwise = pure $ a * b
-- | in `scenario1` `g` uses one erroneous and one non-erroneous result.
-- since `g` tries to consume one erroneous result `r3` `g` can't execute.
-- it short-circuits the computation.
-- all up till then collected errors are returned.
--
-- >>> scenario1
-- Errors (Left ["f1 failed", "f3 failed"])
scenario1 :: Errors [Text] Result
scenario1 = do
r1 <- f1 :: Errors [Text] Int -- fails, collected
r2 <- f2 :: Errors [Text] Int -- succeeds
r3 <- f3 :: Errors [Text] Int -- fails, collected
-- we haven’t short-circuited until here, instead collected errors
-- although `f1` failed, `f2` and `f3` still have been executed
-- but now we need to short circuit on `f4` because at least any of `r2` or `r3` has error value
rg <- g r2 r3 :: Errors [Text] Int
pure $ Result {..}
-- | `g` can execute (all values are non-errors) but `g` itself produces an error.
-- computation short-circuits only on construction of `Result`.
-- that is because `Result` only carries non-error values but `g` produced error value.
-- `scenario2` returns error values of `f1` and `g`.
--
-- >>> scenario2
-- Errors (Left ["f1 failed", "g: a + b NOT > 6"])
scenario2:: Errors [Text] Result
scenario2 = do
r1 <- f1 :: Errors [Text] Int -- fails, collected
r2 <- f2 :: Errors [Text] Int -- succeeds
r4 <- f4 :: Errors [Text] Int -- succeeds
-- we haven’t short-circuited until here, instead collected errors
-- although `f1` failed, `f2` and `f4` still have been executed
-- `g` receives non-error values `r2` and `r4` with values 2 and 4
-- now, g itself returns an error due to its logic
rg <- g r2 r4 :: Errors [Text] Int
-- we still don’t short-circuit `g`'s error being produced
-- we only short-circuit on the error value tried being used by `Result`:
pure $ Result {..}
-- | `g` does neither is fed with erroneous values nor
-- does `g` itself return an error. Instead construction of `Result` fails
-- since it tries to load value of `r1` which is erroneous but should be `Int`.
--
-- >>> scenario3
-- Errors (Left ["f1 failed"])
scenario3 :: Errors [Text] Result
scenario3 = do
r1 <- f1 :: Errors [Text] Int -- fails, collected
r2 <- f2 :: Errors [Text] Int -- succeeds
r5 <- f5 :: Errors [Text] Int -- succeeds
-- we haven’t short-circuited until here, instead collected errors
-- although `f1` failed, `f2` and `f4` still have been executed
-- `g` receives non-error values `r2` and `r5` with values 2 and 5
-- now, `g` itself succeeds, no error
rg <- g r2 r5 :: Errors [Text] Int
-- `Result` is constructed, since `f1` failed, `r1` is of error value now
-- hence `Result` cannot be constructed, failure "f1 failed" should be returned
pure $ Result {..}
-- | no error anywhere,
--
-- >>> scenario4
-- Errors (Right 7)
scenario4 :: Errors [Text] Result
scenario4 = do
r1 <- f4 :: Errors [Text] Int -- succeeds
r2 <- f2 :: Errors [Text] Int -- succeeds
r5 <- f5 :: Errors [Text] Int -- succeeds
-- now, `g` itself succeeds, no error
rg <- g r2 r5 :: Errors [Text] Int
-- `Result` is constructed successfully because it only takes in non-error values
pure $ Result {..}
And here comes the error:
rc/MyLib2.hs:42:3: error: [GHC-39999]
• No instance for ‘Monad (Errors [Text])’
arising from a do statement
• In a stmt of a 'do' block: r1 <- f1 :: Errors [Text] Int
In the expression:
do r1 <- f1 :: Errors [Text] Int
r2 <- f2 :: Errors [Text] Int
r3 <- f3 :: Errors [Text] Int
rg <- g r2 r3 :: Errors [Text] Int
....
In an equation for ‘scenario1’:
scenario1
= do r1 <- f1 :: Errors [Text] Int
r2 <- f2 :: Errors [Text] Int
r3 <- f3 :: Errors [Text] Int
....
|
42 | r1 <- f1 :: Errors [Text] Int -- fails, collected
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Failed, no modules loaded.
What I've been doing so far in situations like this isn't particularly sophisticated. It's possible that there's a more elegant way to do this with Haskell, but on the other hand, what follows here can be ported to other languages.
The way I address these sorts of problems is that I alternate between two representations of the same data; one which has validation-like behaviour, and one which is a proper monad.
For the monad it's obvious to use the built-in
Either. For the validation representation, I'll here useErrorsprovided in the OP.These two representations are isomorphic, but it's probably convenient to have explicit functions to go back and forth:
Furthermore, I'm also going to take advantage of this extension:
With these additions, we can implement each of the scenarios in the OP.
Scenario 1
Most of these scenarios have similar-looking code, so I'm going to mostly comment on the first one:
The outer
doexpression usesErrors, which is why we need theApplicativeDoextension. WhileErrorsis not aMonadinstance, that extension still enables thedosyntax.In order to compose
r2andr3together withg, you'll need some kind ofjoinfunctionality, so the innerdoworks in normalEither-monad mode by converting the values, and then converting the result back toErrors.It may not be the most elegant solution, but it works:
Scenario 2
This one, and the subsequent examples, follows the same template:
Demo:
Scenario 3
Code:
Demo:
Scenario 4
Code:
Demo: