I know this question has been asked frequently, but all the answers do not really seem to satisfy me. Knowing when to choose a Red-Black tree over an AVL and vice versa is not as easy as it seems. A lot of factors come in to play to provide an accurate answer. I found two sources which actually come close to the answer:
- https://stackoverflow.com/a/28846533/10061169
- https://codedeposit.wordpress.com/2015/10/07/red-black-vs-avl/
I think benchmarking is a 'good' approach, however it doesn't explain 'why' we see what we see. What I concluded so far from my research:
1. Worst Case
1.1 Insert
The definition of 'worst-case', in case of the red-black tree and avl tree 'Insert' operation, is when data gets inserted on one side of the tree. This means data gets inserted in ascending or descending order.
When Inserting 7 nodes in a Red-Black Tree this means in worst-case:
- Rotation, Color Flip, Rotation, Color Flip, Rotation (use https://www.cs.usfca.edu/~galles/visualization/RedBlack.html for visualization)
When Inserting 7 nodes in an AVL tree this means in worst-case:
- Ratation, Rotation, Rotation, Rotation (use https://www.cs.usfca.edu/~galles/visualization/AVLtree.html for visualization)
In addition, when inserting the 7th node, the Red-Black tree has to compare 1 node more than the AVL tree, which means the search takes longer.
To conclude which tree has the best time complexity (in worst-case) for the Insert operation, a 'weight' factor has to be added to the Rotation, Color flip and Search operation in relationship to the size of the data set:
- When the data set is small, the Rebalancing (=Rotation and Color Flip) 'act' seems to outweigh the Search 'act'. This is because the height of the tree is relatively small.
- When the data set is large, the Search 'act' seems to outweigh the Rebalancing 'act'. This is because the height grows, but the Rebalancing 'act' remains constant.
The two sources (the answer froms stack overflow and the benchmark) seem to match in that regard. The benchmark shows that initially the Red-Black tree has a better time complexity, however when the data-set grows and searching becomes more 'dominant', then the AVL tree becomes 'quicker'.
1.2 Search
The definition of 'worst-case', in case of the red-black tree and avl tree 'Search' operation, is when data is distributed in such a way that it causes the heighest unbalance factor between the left and right sub tree.
When Searching a Red-Black tree, this means in worst-case:
- a height difference of 2 between the left and right sub tree
When Searching a n AVL tree, this means in worst-case :
- a height difference of 1 between the left and right sub tree
A Red-Black tree has to peform one additional compare during a 'Search' in worst-case. This makes the AVL tree 'quicker' with respect to Searching in worst-case for both smaller and larger data sets.
The two sources (the answer from Stack Overflow and the benchmark) seem to match in that regard.
2. Average Case
2.1 Insert
The definition of 'average-case', in case of the red-black tree and avl tree 'Insert' operation, is when data gets inserted on a random side of the tree. In case of the bench mark this means using a random number generator.
When 'Inserting' in a Red-Black tree, this means in average-case:
- The chance of a Color Flip becomes heigher (in comparison to a worst-case scenario) when 'Inserting' randomly on the left or right side of a Red-Black tree. As a result, less Rotations will take place in a Red-Black tree. In general, Rebalancing will be done less frequent in comparison to worst-case.
- The chance of inbalance becomes smaller (in comparison to a worst-case scenario) as nodes are more distributed. This means 'Searching' become a less 'dominant' factor.
When 'Inserting' in a AVL tree, this means in average-case:
- The AVL Tree will only benifit from the random distribution as it only has a single 'way' of rebalancing.
Random Insertion that causes inbalance will make an AVL tree Rotate, while it might make the Red-Black tree to Color Flip. Color flips are less 'expensive' than Rotations. The 'Searching' act, during Insertion, becomes less relevant as data will be more distributed on the right and left side of the tree. I.e. the chance of a height difference of 2 between right and left sub tree, in case of a Red-Black tree, becomes smaller (in comparison to an worst-case).
The Red-Black tree in the average case for the 'Insert' operation is 'quicker' than the AVL tree and is independent from the number of elements in the data set. The bench mark aligns with this statement. However, the https://stackoverflow.com/a/28846533/10061169 does not.
2.2 Search
The definition of 'average-case', in case of the Red-Black tree and AVL tree 'search' operation, is when nodes are randomly distributed between the left and right trees. In case of the bench mark this means using a random number generator.
As data gets inserted in a more distributed fashion, the chance of inbalance will reduce. This means the Red-Black tree will probably encounter a height difference of 2 less frequent in comparison to the worst-case. As 'extreme' inbalance in the Red-Black tree will arise less frequent the AVL and Red-Black tree will perform identical for the Average case 'Search' operation.
The benchmark seems to align with the AVL and Red-Black tree performing identical in the average-case for the 'Search' operation. However, the stack overflow post does not. It states that the AVL tree is always faster in terms of 'Searching'.
Questions:
- What seems unclear to me is why the Red-Black tree in worst case is quicker for the 'Insert' operation? I.e. Why are 2 color flips and 1 additional compare in the Red-Black tree faster than 1 rotation from the AVL tree?
- The bench mark shows that the Red-Black tree is 'quicker' in the average-case for the 'Insert' operation than the AVL tree. However, the https://stackoverflow.com/a/28846533/10061169 indicates it is not the case for larger data sets. Does that mean the stackoverflow answer didn't take the average-case into account or am I missing something?
- The bench mark shows that the Red-Black tree and AVL tree peform similair in the average-case for the 'Search' operation. However, the https://stackoverflow.com/a/28846533/10061169 disagrees. Does that mean the stackoverflow answer didn't take the average-case into account or am I missing something?
- Does anyone have an AVL vs Red-Black tree bench mark of the 'Delete' operation both for average and worst-case?