Mapping functions over data in 'sum-types' in strongly typed programming languages

241 views Asked by At

Today I encountered an implementation challenge with a large DU (sum-type) in F#, where I basically wanted to be able to map a set of functions over the values of the DU without having to repeat the matched case on the right hand side. The motivation being, that it would be quite easy to accidently return a wrong case, which the type checker obviously wouldn't be able to catch. To illustrate the problem, here is one candidate solution 'pattern' I came up with for F# using reflection

open FSharp.Reflection
type MyData =
    | Foo of int
    | Bar of bool
    | Baz of string
    | Hello of int
    override this.ToString () =
        let (case, _ ) = FSharpValue.GetUnionFields(this, typeof<MyData>)
        case.Name
    static member map (f : int -> int) (g : bool -> bool) (h : string -> string) x : MyData =
        let f' = box << f
        let g' = box << g
        let h' = box << h
        let cases = FSharpType.GetUnionCases typeof<MyData>
        let matchCase = cases |> Array.find (fun case -> case.Name = x.ToString())
        let res =
            match x with
            | Foo v -> f' v
            | Bar v -> g' v
            | Baz v -> h' v
            | Hello v -> f' v
        FSharpValue.MakeUnion(matchCase, [| res |]) :?> MyData

let mapMyData : MyData -> MyData = MyData.map ((+) 1) not (fun s -> s.ToUpper())
let foo : MyData = Foo 42
let newFoo : MyData = mapMyData foo
let bar : MyData = Bar true
let newBar : MyData = mapMyData bar

And now I am curious about what other solutions exists for this problem, in F# as well as in other languages that has sum-types. I can easily see this being a general problem in for example Haskell, Purescript, Idris, Rust.

The problem can be distilled down to:

  1. Mapping over the values of a sum-type without repeating the case names on the right-hand side.
  2. Leverage the type checker as much as possible for safety.
4

There are 4 answers

1
eggyal On

In Rust at least (I can't speak for the other languages you mention), one can mutate the value in-place rather than taking & returning ownership:

enum MyData {
    Foo(i32),
    Bar(bool),
    Baz(String),
    Hello(i32),
}

impl MyData {
    fn map(&mut self, f: fn(&mut i32), g: fn(&mut bool), h: fn(&mut str)) {
        use MyData::*;
        match self {
            Foo(i) => f(i),
            Bar(b) => g(b),
            Baz(s) => h(s),
            Hello(i) => f(i),
        }
    }
}

Playgroud.

0
MrD at KookerellaLtd On

If you want each case in the union to be typesafe you can make it typesafe

type Foo = Foo' of int
type Bar = Bar' of bool

type MyData =
    | Foo of Foo
    | Bar of Bar
    static member bimapish (f : int -> int) (g : bool -> bool) x : MyData =
        match x with
        | Foo (Foo' v) -> Foo (Foo' (f v))
        | Bar (Bar' v) -> Bar (Bar' (g v))

let foo = Foo (Foo' 42)
let newFoo = MyData.bimapish ((+) 1) not foo

OO Visitors effectively do this, they encode each case of the union with a unique type, so you can probably do it that way too.

or have i missed the point? (I accept I have repeated the case name, but your motivation was type safety, not brevity)?


someone put a comment in and deleted it, I think they objected that I was allowing this error...fair enough...but there's a cost.

type Foo = Foo' of int
type Bar = Bar' of int

type MyData =
    | Foo of Foo
    | Bar of Bar
    static member bimapish (f : int -> int) (g : int -> int) x : MyData =
        match x with
        | Foo (Foo' v) -> Foo (Foo' (f v))
        // WRONG this is applying an 'f' to a Bar!!! but not type error
        | Bar (Bar' v) -> Bar (Bar' (f v)) 

let foo = Foo (Foo' 42)
let newFoo = MyData.bimapish ((+) 1) ((+) 1) foo

in which case you would have to effectively make the 'map' functions typesafe...but this is becoming verbose to the caller.

type Foo = Foo' of int
type Bar = Bar' of int

type MyData =
    | Foo of Foo
    | Bar of Bar
    static member bimapish (f : Foo -> Foo) (g : Bar -> Bar) x : MyData =
        match x with
        | Foo v -> Foo (f v)
        //| Bar v -> Bar (f v) // now get a type error!
        | Bar v -> Bar (g v) // this works though

let foo = Foo (Foo' 42)
let newFoo = 
    MyData.bimapish 
        (fun (Foo' x) -> Foo' (x + 1)) 
        (fun (Bar' x) -> Bar' (x + 2)) 
        foo

If you are happy for the type checker to give you a warning rather than an explicit error if someone gets it wrong, you can dispense with much of the boiler plate by simply making it generic

type MyData<'a,'b> =
    | Foo of 'a
    | Bar of 'b
    static member bimapish (f : 'a -> 'a) (g : 'b -> 'b) x : MyData<'a,'b> =
        match x with
        | Foo v -> Foo (f v)
        // this DOESNT cause an error but does cause a warning, 'a has to be the same as 'b...which is enough for me to know its wrong
        //| Bar v -> Bar (f v)
        | Bar v -> Bar (g v)

let foo = Foo 42
let newFoo = 
    MyData.bimapish 
        ((+) 1) 
        ((+) 2) 
        foo

ok, and the caller CAN potentially pass the wrong functions. but for me I'd probably go with this simply for simplicity.

3
Daniel Wagner On

The motivation being, that it would be quite easy to accidently return a wrong case, which the type checker obviously wouldn't be able to catch.

If you make things polymorphic enough, the type checker actually can catch such errors. I'll speak in Haskell because it's my native tongue, but I'm sure there's an F# analog.

{-# LANGUAGE LambdaCase #-}

data MyDataParameterized a b c d
  = Foo a
  | Bar b
  | Baz c
  | Hello d

-- there is only *one* well-typed implementation of this function that doesn't loop forever
quadramapParameterized :: (a -> a') -> (b -> b') -> (c -> c') -> (d -> d') ->
    MyDataParameterized a b c d -> MyDataParameterized a' b' c' d'
quadramapParameterized fa fb fc fd = \case
    Foo a -> Foo (fa a)
    Bar b -> Bar (fb b)
    Baz c -> Baz (fc c)
    Hello d -> Hello (fd d)

type MyData = MyDataParameterized Int Bool String Int

-- there are many, many "wrong" implementations of this type, but
-- the right one is just so obviously right
quadramap :: (Int -> Int) -> (Bool -> Bool) -> (String -> String) ->
    MyData -> MyData
quadramap fInt fBool fString = quadramapParameterized fInt fBool fString fInt

Of course, doing things this way is wildly painful, so practically speaking mostly people don't. They just take on the burden of manually ensuring the same constructor is used on both sides in one top-level fold, then use that fold to avoid manual pattern matching in as many places as that makes sense. Compared to other burdens facing the workaday programmer, "same constructor left and right" is very light -- it's a very local property of your code, and usually blatantly obvious when you screw up, to the point that in the rare situation where you intended to return a different constructor it warrants a comment explaining why to assuage the reader.

2
K. A. Buhr On

I think the usual approach for uniform processing of small parts of large data structures is to use some kind of generics. For your specific example, you can write a fairly straightforward Haskell solution using "Scrap Your Boilerplate" generics from the syb package:

import Data.Data               -- from `base`
import Data.Generics.Aliases   -- from `syb`

data MyData
  = Foo Int
  | Bar Bool
  | Baz String
  | Hello Int
  deriving (Show, Data)

myMap :: (Int -> Int) -> (Bool -> Bool) -> (String -> String) -> MyData -> MyData
myMap f g h x = gmapT (mkT f `extT` g `extT` h) x

which works as expected:

λ> map (myMap (*2) not ("hello "++)) [Foo 42, Bar True, Baz "world", Hello 5]
[Foo 84,Bar False,Baz "hello world",Hello 10]

Unlike "normal" functional language facilities, generics tend to have syntax and design that differs widely between languages, or even within a language, and a given solution may be very case-specific.

Related Questions in F#