I'm looking for a way to optimize alpha blending, but for two colors with alpha (what differs from the question How to alpha blend RGBA unsigned byte color fast? )
Initially I used a solution with floats (RGB ranging from 0.0f to 255.0f and A ranging from 0.0f to 1.0f):
inline void alphaBlend(Color& baseColor, Color targetColor)
{
float newAlpha = (1 - targetColor.A) * baseColor.A + targetColor.A;
baseColor.R = ((1 - targetColor.A) * baseColor.A * baseColor.R + targetColor.A * targetColor.R) / newAlpha;
baseColor.G = ((1 - targetColor.A) * baseColor.A * baseColor.G + targetColor.A * targetColor.G) / newAlpha;
baseColor.B = ((1 - targetColor.A) * baseColor.A * baseColor.B + targetColor.A * targetColor.B) / newAlpha;
}
I changed the algorithm to work on unsigned int RGBA colors. I replaced every reference to alpha with (alpha / 255) and then corrected the formulas, so that values are still within the proper ranges.
baseColor.R = ((1 - targetColor.A) * baseColor.A * baseColor.R + targetColor.A * targetColor.R) / newAlpha;
Shorthand (targetColor.A -> tA etc.):
R = ((1 - tA) * bA * bR + tA * tR) / newAlpha
(introducing 255-based alpha requires replacing all A instances with A/255)
= ((1 - (tA / 255)) * (bA / 255) * bR + (tA / 255) * tR) / (newAlpha / 255)
(remove 255 from the denominator's denominator)
= (((1 - (tA / 255)) * (bA / 255) * bR + (tA / 255) * tR) * 255) / newAlpha
(get rid of direct alpha divisions by 255 by multiplying parethesis by 255/255)
= (( ((255 - tA) * bA * bR) / 255^2 + (tA * tR) / 255) * 255) / newAlpha
(multiplying by the last 255 causes denominators to reduce)
= ( ((255 - tA) * bA * bR) / 255 + (tA * tR * 255) / 255 ) / newAlpha
(Pushing numerator's denominator (255) to the denominator)
= ( (255 - tA) * bA * bR) + (tA * tR * 255) ) / (255 * newAlpha)
(Expanding first multiplication in numerator)
= ( 255 * bA * bR - tA * bA * bR + tA * tR * 255) / (255 * newAlpha)
^^^^^^^^^^^^ ^^^^^^^^^^^^^
(reordering not to fall below 0 during calculations)
= ( 255 * bA * bR + tA * tR * 255 - tA * bA * bR ) / (255 * newAlpha)
(grouping to minimize multiplications)
= ( (ba * bR + tA * tR) * 255 - tA * bA * bR ) / (255 * newAlpha)
(introducing bit shifting - losing precision, but in an acceptable range)
~= ( ((ba * bR + tA * tR) << 8) - tA * bA * bR) / (newAlpha << 8)
I managed to write the following code:
inline void alphaBlend(IntColor& baseColor, IntColor targetColor)
{
unsigned int a = (((baseColor.A + targetColor.A) << 8) - targetColor.A * baseColor.A) >> 8;
if (a > 0)
{
unsigned int divisor = a << 8;
unsigned int baseAR = baseColor.A * baseColor.R;
baseColor.R = (((targetColor.A * targetColor.R + baseAR) << 8) - (baseAR * targetColor.A)) / divisor;
unsigned int baseAG = baseColor.A * baseColor.G;
baseColor.G = (((targetColor.A * targetColor.G + baseAG) << 8) - (baseAG * targetColor.A)) / divisor;
unsigned int baseAB = baseColor.A * baseColor.B;
baseColor.B = (((targetColor.A * targetColor.B + baseAB) << 8) - (baseAB * targetColor.A)) / divisor;
baseColor.A = a;
}
else
{
baseColor.R = 0;
baseColor.G = 0;
baseColor.B = 0;
baseColor.A = 0;
}
}
This change reduced the rendering of sample data from 27559 ms to 17751 ms. Since the alpha blending seems to be the most common operation in the rendering workflow I'm curious if there is a way to optimize it even further.
I thought about doing calculations on R and B at the same time, but unfortunately in some circumstances the calculations will exceed two bytes (for instance if bA = bR = tA = tR = 255, the left part of the subtraction will equal to 33162750 = 0x1faa05fe).
Is there any other optimization I could apply to make this code faster?
Edit: responding to comments:
- Target architecture is x64, target processor may be Intel Core family
- Input type is guaranteed to be 32-bit RGBA
- Memory layout is BGRA (8888)
- Regarding SIMD, my application is a vector animation renderer. Every object is rendered on a separate bitmap and then alpha-blended into result one, because every single object may have applied alpha/mask/transformations/effects or may consist of multiple sub-objects, from which every single one also may have those applied.
- Compiler is the one from Microsoft Visual Studio 2022. Application is Windows-only.
I'm leaving this answer for people also looking for integer-calculation-based alpha-blending of two colors with alpha (that is, allowing the "background" or "base" color to be semi-transparent as well). It is not very fast, but certainly faster than its floating-point equivalent.
The code presented in my question is unfortunately flawed and sometimes gives results of 256, which causes in some cases ugly black pixels (
(unsigned char)256 == 0).The code below presents the solution and also serves as a correctness check. It validates that:
[0..255]Informatively, the floating-point alpha, which normally would be left in range [0.0f..1.0f] is normalized to [0.0f..255.0f], so that I can compare it with its int counterpart.
The validation code/solution follows.
The result is surprisingly good - application tests all possible combinations of colors and alphas and the numbfer of those that differ from the floating-point calculations (which I treat as valid) is below 1% (total number of combinations is 4 294 967 296):
The popular optimization for alpha blending is achieved by replacing (
* 255, / 255) operations with, respectively (<< 8, >> 8), which equals to (* 256, / 256). Since we need to multiply and divide by 255 and not 256, the price for optimization is decrease in precision. The bad news is that number of incorrect results increases dramatically, but the good news is that the error still does not exceed unit value of alpha and color:So now:
If you decide to pick the last solution, don't forget to post another answer here - I'm sure everyone will benefit from having a fast alpha-blending algorithm.