How to re-index nodes based on (a) BFS, or (b) DFS using XSLT?

23 views Asked by At

I have created a random forest model with H2o and exported as PMML. Below is a snippet showing the first decision tree with nodes and their IDs.

[...]
                            <TreeModel functionName="regression" missingValueStrategy="defaultChild">
                                <MiningSchema>
                                    <MiningField name="Sepal.Length"/>
                                    <MiningField name="Petal.Width"/>
                                </MiningSchema>
                                <Node id="1" defaultChild="2">
                                    <True/>
                                    <Node id="2" defaultChild="3">
                                        <SimplePredicate field="Sepal.Length" operator="lessThan" value="5.450927734375"/>
                                        <Node id="3" score="1.0">
                                            <SimplePredicate field="Petal.Width" operator="lessThan" value="0.800585925579071"/>
                                        </Node>
                                        <Node id="4" score="0.0">
                                            <SimplePredicate field="Petal.Width" operator="greaterOrEqual" value="0.800585925579071"/>
                                        </Node>
                                    </Node>
                                    <Node id="5" defaultChild="7">
                                        <SimplePredicate field="Sepal.Length" operator="greaterOrEqual" value="5.450927734375"/>
                                        <Node id="6" score="1.0">
                                            <SimplePredicate field="Petal.Width" operator="lessThan" value="0.601367175579071"/>
                                        </Node>
                                        <Node id="7" score="0.0">
                                            <SimplePredicate field="Petal.Width" operator="greaterOrEqual" value="0.601367175579071"/>
                                        </Node>
                                    </Node>
                                </Node>
                            </TreeModel>
[...]

However, when comparing the node IDs with model info it seems the values deviate. Below is a short summary The corresponding model (generated with H2o) shows deviating node ids

    tree node      pred         feat       val dt.left_children dt.right_children
 1:    1    0 0.0000000 Sepal.Length 5.4509277                1                 2
 2:    1    1 0.8823530  Petal.Width 0.8005859                3                 4
 3:    1    2 0.1020408  Petal.Width 0.6013672                5                 6
 4:    1    3 1.0000000         <NA>        NA               -1                -1
 5:    1    4 0.0000000         <NA>        NA               -1                -1
 6:    1    5 1.0000000         <NA>        NA               -1                -1
 7:    1    6 0.0000000         <NA>        NA               -1                -1
 8:    2    0 0.0000000  Petal.Width 0.8005859                1                 2
 9:    2    1 1.0000000         <NA>        NA               -1                -1
10:    2    2 0.0000000         <NA>        NA               -1                -1

It appears the model uses BFS-based indexing while the PMML output has a DFS-based indexing.

QUESTION: how can we use XSLT to create new node IDs that match the table shown above?

0

There are 0 answers