Incorrect variable range behavior in RooFit

105 views Asked by At

The RooFit package allows me to import some TTree branches, but it constrains values included in those branches between the min and max value set by a RooRealVar. For instance:

RooRealVar t1("t1", "Some variable", 0.0, 1.0);
RooDataSet data("data", "My Dataset", RooArgSet(t1), Import(*myttree));

so long as the TTree myttree contains a branch called t1. This is all fine and good, until you start getting close to floating point values in the range. My particular problem occurs because I have a variable like t1 which maps to some variable with an exponential distribution. I'm trying to fit this distribution, but the fits fail for values of t1 ~ 0.0. My solution was to just change the range a bit to cut off potential events where the stored value of t1 in the tree is actually zero or close to it (all the following code is run in the ROOT interpreter, but I have confirmed it works in compiled code as well):

root[0] RooRealVar t1("t1", "Some variable", 0.001, 1.0);

However, note the following annoying behavior:

root[1] t1 = 0.000998;
root[2] t1.getVal();
(double) 0.0010000000
// this is correct, as 0.000998 < 0.001 so RooFit set it as the lower limit
root[3] t1 = 0.000999;
root[4] t1.getVal();
(double) 0.00099900000
// this is incorrect.

Yes, the extra zero is printed in the second case, which I also don't understand, but I'm mostly concerned with the failure to recognize that 0.000999 < 0.001. When I compare these values later in an if-statement, I find that C++ can tell the difference. Everything appears to be double precision here, and I've been tracing through the code to see where the precision error seems to crop up. Correct me if I'm wrong, but a float should still hold these numbers up to comparison precision, right? What's going on here? If this is some floating point error problem, what's the best way to resolve it? I have several events with values like t1 = 0.000999874, and changing the bounds to something like 0.0001 doesn't really help either, there are still events which live on this edge.

Edit: I want to emphasize that while this is probably a floating point problem, it really shouldn't be. For instance, the following code works:

root[0] RooRealVar t1("t1", "Some variable", 0.001, 1.0);
root[1] t1 = 0.000999;
root[2] t1.getVal() < 0.001;
(bool) true
1

There are 1 answers

0
Nathaniel D. Hoffman On

Well everyone, I found the answer (and I hate it). It actually has very little to do with floating point arithmetic, and I'm honestly not sure why the code was written this way. From the source code that determines if a value is "in range":

bool RooAbsRealLValue::inRange(double value, const char* rangeName, double* clippedValPtr) const
{
  // double range = getMax() - getMin() ; // ok for +/-INIFINITY
  double clippedValue(value);
  bool isInRange(true) ;
 
  const RooAbsBinning& binning = getBinning(rangeName) ;
  double min = binning.lowBound() ;
  double max = binning.highBound() ;
 
  // test this value against our upper fit limit
  if(!RooNumber::isInfinite(max) && value > (max+1e-6)) {
    clippedValue = max;
    isInRange = false ;
  }
  // test this value against our lower fit limit
  if(!RooNumber::isInfinite(min) && value < min-1e-6) {
    clippedValue = min ;
    isInRange = false ;
  }
 
  if (clippedValPtr) *clippedValPtr=clippedValue ;
 
  return isInRange ;
}

As we can see here, RooFit doesn't actually check if min < val < max or even min <= val <= max but rather min - 1e-6 < value < max + 1e-6! I couldn't find a single place where this was documented explicitly, but I'm even more concerned that there is a separate implementation of inRange which takes a variable name (or comma separated list of variable names) and returns a result which is incompatible with the prior implementation:

bool RooAbsRealLValue::inRange(const char* name) const
{
  const double val = getVal() ;
  const double epsilon = 1e-8 * fabs(val) ;
  if (!name || name[0] == '\0') {
    const auto minMax = getRange(nullptr);
    return minMax.first - epsilon <= val && val <= minMax.second + epsilon;
  }
 
  const auto& ranges = ROOT::Split(name, ",");
  return std::any_of(ranges.begin(), ranges.end(), [val,epsilon,this](const std::string& range){
    const auto minMax = this->getRange(range.c_str());
    return minMax.first - epsilon <= val && val <= minMax.second + epsilon;
  });
}

Here, we can see the creation of an epsilon = 1e-8 * fabs(val) rather than the arbitrary 1e-6 given in the first definition. This comparison uses a <= rather than a < also. It should be noted that the method used to filter trees when imported in this way uses the first implementation (source here).

Somewhere along the way (I'm not entirely sure where actually), some of these arbitrary comparisons lead to the following paradoxical behavior:

root[0] RooRealVar t1("t1", "Some variable", 0.001, 1.0);
root[1] t1 = 0.001 - 1e-6;
(RooAbsArg &) RooRealVar::t1 = 0.000999  L(0.001 - 1) 
root[2] t1 = 0.001 - 1e-8 * 0.001;
(RooAbsArg &) RooRealVar::t1 = 0.001  L(0.001 - 1)
root[3] t1 = 0.00099999999;
(RooAbsArg &) RooRealVar::t1 = 0.001  L(0.001 - 1) 

I would classify this as a bug. Under no circumstances should 0.00099900000 be classified as within the range of (0.001 - 1) where 0.00099999999 is not!