Implementing an arithmetic system in R

108 views Asked by At

I started to implement a kind of numbers in R. I have a function to add them, multiply them, etc. Now I want to do a convenient interface for the arithmetic on these numbers. That is, I don't the want the user to type multiply(x, add(y, z)), but x * (y + z) instead, etc. What is the best way to achieve this in terms of efficiency, S3 or S4? I already did such an arithmetic implementation in S4 for a package (lazyNumbers), this was a bit long, a bit "verbose". Is it more comfortable in S3? I don't know how to do with S3 yet, but I'll learn if needed.

2

There are 2 answers

8
JDL On

The answer will depend on how your "numbers" operate, but I'll try to identify the strengths and weaknesses of each approach here so you can make up your own mind.

S3

  • only checks the class() of the first argument. So if you have an object x of your class, x + 1 and 1 + x won't call the same method. (Update: apparently, members of the Ops group do consider the class of both arguments, so if there is a +.myclass or Ops.myclass function then these will still be called in the case of 1+x and x+1. However, for x+y where there are separate methods for the class of x and y, the default method is used, which will presumably fail.)
  • I believe it's quicker as there are fewer checks, but I haven't actually tested it.

S4

  • checks the class() of all arguments
  • will take more time as it has to look up the whole methods table, rather than look for a function called generic.class
  • for internal generic functions, will only look for methods if at least one of the arguments is an S4 object (shouldn't be a problem if your class is S4).
  • Checks validity of objects it creates (by default, just that the objects and slots therein have the correct class. This can be overridden if you want using setValidity (e.g. a function that always returns TRUE to skip validity checking).

Also look into the group generics Ops, Math and so on. It may be that even if you need to use S4 that you can just write methods for these. (Rememer that + and - can be unary as well as binary though, you need to make sure that the function works as intended for the case when e1 is your S4 class and e2 is missing. Depending on what sort of object your class represents, "as intended" might mean throwing an error.)

In terms of efficiency, if you are spending a long time in method dispatch rather than actual calculation then you are probably doing something wrong. In particular, consider having your class represent a vector (perhaps a list if you really need to) of whatever sort of number you are working with. Once a method has been chosen, the calculation will take the same amount of time regardless of whether we used S3 or S4, with the exception that S4 will check that the object is valid at the end. The check is typically faster than the method dispatch unless the class is very complex (i.e. has a lot of slots or a deep inheritance structure).

If by "efficiency" you simply meant not writing lots of code then group generics are the best time saver. They work with both S3 and S4.

Below is a simple example of a group generic. I've used the example of a class with two slots, x as an ordinary numeric and timestamp as the time it was calculated. We want operators to "act on the x slot" and we achieve that as follows:

## define simple class based on numeric
timestampedNum <- setClass(
  "timestampedNum",
  slots=c(timestamp="POSIXct",x="numeric"),
  prototype=prototype(timestamp=Sys.time())
)
## set methods for Ops group generic
## we need four of them:
## one for unary +, -
## one for our class [op] something else
## one for something else [op] our class
## one for our class [op] our class
setMethod(
  "Ops",
  signature = signature(e1="timestampedNum",e2="missing"),
  definition = function(e1) timestampedNum(
    x=callGeneric(e1@x),
    timestamp=Sys.time()
  )
)
setMethod(
  "Ops",
  signature = signature(e1="timestampedNum",e2="ANY"),
  definition = function(e1,e2) timestampedNum(
    x=callGeneric(e1@x,e2),
  timestamp=Sys.time()
  )
)
setMethod(
  "Ops",
  signature = signature(e1="ANY",e2="timestampedNum"),
  definition = function(e1,e2) timestampedNum(
    x=callGeneric(e1,e2@x),
    timestamp=Sys.time()
  )
)
setMethod(
  "Ops",
  signature = signature(e1="timestampedNum",e2="timestampedNum"),
  definition = function(e1,e2) timestampedNum(
    x=callGeneric(e1@x,e2@x),
  timestamp=Sys.time()
  )
)

z <- timestampedNum(x=5)
z
+z
-z
z + 1
1 + z
z + z

which produces six objects of class timestampedNum with x slots 5, 5, -5, 6, 6 and 10 respectively.

0
Mikael Jagan On

Just to elaborate on my comment ...

x <- structure(0, class = "zzz")

.S3method("Ops", "zzz",
          function(e1, e2) {
              if (missing(e1))
                  "A" # should never happen
              else if (missing(e2))
                  "B"
              else if (!inherits(e2, "zzz"))
                  "C"
              else if (!inherits(e1, "zzz"))
                  "D"
              else "E"
          })

+x
## [1] "B"
x + 1
## [1] "C"
1 + x
## [1] "D"
x + x
## [1] "E"