Differentiating code

One step in the spectral upsampling method described by Jakob and Hanika (2019) is computing an error function, which is used to find the model parameters. I wanted to provide derivatives of the error function with respect to its arguments to the optimization algorithm I used, in hopes for faster and more reliable convergence.

How does one differentiate Python code though? Every program, no matter how complicated, is a series of simple operations and the chain rule can be used. Computer algebra systems (the one I use is http://maxima.sourceforge.net/Maxima) can help with the process, making this a trivial task.

The error function

The algorithm is as follows:

  • Given a triplet of coefficients (model parameters), $c_0$, $c_1$, $c_2$, compute a reflectance according to the model.

$$U(\lambda) = c_0\lambda^2 + c_1\lambda + c_2$$

$$R(\lambda) = \frac{1}{2} + \frac{U}{2\sqrt{1 + U^2}}$$

  • Compute the radiosity of a surface with the reflectance, lit by the given illuminant. This is done by simply multiplying the reflectance by the illuminant's spectral distribution.

$$E(\lambda) = I(\lambda)R(\lambda)$$

  • Compute the CIE 1931 XYZ tristimulus values describing how the reflected light would be perceived.

X = k\int \bar E(\lambda) \bar x(\lambda) \mathrm{d}\lambda \\
Y = k\int \bar E(\lambda) \bar y(\lambda) \mathrm{d}\lambda \\
Z = k\int \bar E(\lambda) \bar z(\lambda) \mathrm{d}\lambda

where $\bar x$, $\bar y$ and $\bar z$ are the cmfs (color matching functions), quantizing human color vision. They are closely related to (but not the same as) spectral sensivities of the three different kinds of cone cells in a human eye. In particular $\bar y$ is the luminosity function used in photometry, describing how bright light of different wavelengths appears. The normalization constant $k$,

$$k = \left(\int I(\lambda) y(\lambda) \mathrm{d}\lambda\right)^{-1}$$

assures that $Y = 1$ for perfectly white surfaces and that arbitrary units can be used for the illuminant and the cmfs.

  • Compute the CIE 1976 L*a*b* colorspace coordinates. Start by computing the values of the intermediate lightness function for the three XYZ components.

$$X_n = \frac{X}{X_I}$$

$$f_X = \begin{cases}
\sqrt[3]{X_n}  &\text{for}~~X_n > \left(\frac{24}{116}\right)^3 \\
\frac{841X_n}{108} + \frac{4}{29} &\text{otherwise}

Likewise for the remaining two components.

L^* = 116 f_Y - 16 \\
a^* = 500(f_X - f_Y) \\
b^* = 200(f_Y - f_Z)

  • Compute the Euclidean distance between the coordinates above and the target color (given in the L*a*b* colorspace). This is the CIE 1976 color difference (delta E) formula.

$$\Delta E = \sqrt{(L^* - L^*_\text{target})^2 + (a^* - a^*_\text{target})^2 + (b^* - b^*_\text{target})^2}$$


Because spectral data is typically discrete, that is measured at some chosen wavelengths (e.g. every 10 nanometers in the visible range), the algorithm above operates on arrays rather than continuous functions. Integrals are approximated simply as the corresponding finite sum:

$$\int f(x) \mathrm{d}x \approx \sum_i f_i \Delta x$$

This method turned out to be both fast and accurate enough in this application.

The gradient

How does $\Delta E$ depend on the function inputs? Since this is a scalar function of multiple variables, what we're looking for is the gradient or three derivatives of $\Delta E$ with respect to each of the inputs. The answer is given by a repeated application of the chain rule.

The easiest way to approach it is to start from the beginning of the algorithm and compute the three functions $\mathrm{d}R/\mathrm{d}c_i$. (In code the result is a 3-by-n array, where n is the number of discrete wavelengths.) Then compute the derivatives of $E$ using the chain rule.

$$\frac{\mathrm{d}E}{\mathrm{d}c_i} =  \frac{\mathrm{d}E}{\mathrm{d}R}\frac{\mathrm{d}R}{\mathrm{d}c_i} = I\frac{\mathrm{d}R}{\mathrm{d}c_i}$$

The derivatives of $R$ have already been computed in the previous step. Doing this step by step will eventually lead to $\frac{\mathrm{d}(\Delta E)}{\mathrm{d}L^*}$, $\frac{\mathrm{d}(\Delta E)}{\mathrm{d}a^*}$ and $\frac{\mathrm{d}(\Delta E)}{\mathrm{d}b^*}$ and ultimately the gradient of $\Delta E$. Take a look at the full code to see all the formulas.

Automatic differentation

There is a clever trick to “automagically” compute derivatives, without having to go step-by-step, differentiating every operation and writing code manually like I did. Dual numbers have the interesting property that, when used correctly, their dual parts are the derivatives of their real parts. They're defined very similarly to complex numbers, but instead of $z = a+bi$, $i^2 = -1$, one has $z = a + b\epsilon$, $\epsilon^2 = 0$. (Even more interestingly, this can be extended to higher derivatives as well!). See automatic differentiation on Wikipedia.

The code I wanted to differentiate was simple enough that I didn't bother looking for and learning a new Python module. I also avoided pulling additional dependencies.