One step in the spectral upsampling method described by Jakob and Hanika (2019) is computing an error function, which is used to find the model parameters. I wanted to provide derivatives of the error function with respect to its arguments to the optimization algorithm I used, in hopes for faster and more reliable convergence.
How does one differentiate Python code though? Every program, no matter how complicated, is a series of simple operations and the chain rule can be used. Computer algebra systems (the one I use is http://maxima.sourceforge.net/Maxima) can help with the process, making this a trivial task.
The algorithm is as follows:
$$U(\lambda) = c_0\lambda^2 + c_1\lambda + c_2$$ $$R(\lambda) = \frac{1}{2} + \frac{U}{2\sqrt{1 + U^2}}$$
$$E(\lambda) = I(\lambda)R(\lambda)$$
$$\begin{cases}
X = k\int \bar E(\lambda) \bar x(\lambda) \mathrm{d}\lambda
Y = k\int \bar E(\lambda) \bar y(\lambda) \mathrm{d}\lambda
Z = k\int \bar E(\lambda) \bar z(\lambda) \mathrm{d}\lambda
\end{cases}$$
where $\bar x$, $\bar y$ and $\bar z$ are the cmfs (color matching functions), quantizing human color vision. They are closely related to (but not the same as) spectral sensivities of the three different kinds of cone cells in a human eye. In particular $\bar y$ is the luminosity function used in photometry, describing how bright light of different wavelengths appears. The normalization constant $k$,
$$k = \left(\int I(\lambda) y(\lambda) \mathrm{d}\lambda\right)^{-1}$$
assures that $Y = 1$ for perfectly white surfaces and that arbitrary units can be used for the illuminant and the cmfs.
$$X_n = \frac{X}{X_I}$$
$$f_X = \begin{cases}
\sqrt[3]{X_n} &\text{for}~~X_n > \left(\frac{24}{116}\right)^3
\frac{841X_n}{108} + \frac{4}{29} &\text{otherwise}
\end{cases}$$
Likewise for the remaining two components.
$$\begin{cases}
L^* = 116 f_Y - 16
a^* = 500(f_X - f_Y)
b^* = 200(f_Y - f_Z)
\end{cases}$$
$$\Delta E = \sqrt{(L^* - L^*_\text{target})^2 + (a^* - a^*_\text{target})^2 + (b^* - b^*_\text{target})^2}$$
Because spectral data is typically discrete, that is measured at some chosen wavelengths (e.g. every 10 nanometers in the visible range), the algorithm above operates on arrays rather than continuous functions. Integrals are approximated simply as the corresponding finite sum: $$\int f(x) \mathrm{d}x \approx \sum_i f_i \Delta x$$ This method turned out to be both fast and accurate enough in this application.
How does $\Delta E$ depend on the function inputs? Since this is a scalar function of multiple variables, what we're looking for is the gradient or three derivatives of $\Delta E$ with respect to each of the inputs. The answer is given by a repeated application of the chain rule.
The easiest way to approach it is to start from the beginning of the algorithm and compute the three functions $\mathrm{d}R/\mathrm{d}c_i$. (In code the result is a 3-by-n array, where n is the number of discrete wavelengths.) Then compute the derivatives of $E$ using the chain rule. $$\frac{\mathrm{d}E}{\mathrm{d}c_i} = \frac{\mathrm{d}E}{\mathrm{d}R}\frac{\mathrm{d}R}{\mathrm{d}c_i} = I\frac{\mathrm{d}R}{\mathrm{d}c_i}$$ The derivatives of $R$ have already been computed in the previous step. Doing this step by step will eventually lead to $\frac{\mathrm{d}(\Delta E)}{\mathrm{d}L^*}$, $\frac{\mathrm{d}(\Delta E)}{\mathrm{d}a^*}$ and $\frac{\mathrm{d}(\Delta E)}{\mathrm{d}b^*}$ and ultimately the gradient of $\Delta E$. Take a look at the full code to see all the formulas.
There is a clever trick to “automagically” compute derivatives, without having to go step-by-step, differentiating every operation and writing code manually like I did. Dual numbers have the interesting property that, when used correctly, their dual parts are the derivatives of their real parts. They're defined very similarly to complex numbers, but instead of $z = a+bi$, $i^2 = -1$, one has $z = a + b\epsilon$, $\epsilon^2 = 0$. (Even more interestingly, this can be extended to higher derivatives as well!). See automatic differentiation on Wikipedia.
The code I wanted to differentiate was simple enough that I didn't bother looking for and learning a new Python module. I also avoided pulling additional dependencies.