A taste of adjoint sensitivity analysis
In this first post I'd like to give a brief overview of one of the cornerstones of my PhD research - adjoint sensitivity analysis.
To begin, what do we even mean by a sensitivity analysis?
Essentially, we're trying to answer the question- how does a change in a function's input value affect the value of an output variable?
Or, how sensitive are my outputs with respect to my inputs?
This information is highly valuable in e.g. optimization procedures, where we want to find the parameters that maximize or minimize an objective function. For instance, in aircraft design, we would like to find the shape of a wing (input) that minimizes the drag (output). The sensitivities of drag with respect to the shape parameters can be used by the optimization algorithm to significantly speed up the process.
Sounds good. So how do we compute the sensitivities?
Consider the following function which takes 4 inputs and computes 2 outputs:
Our example model performs the following evaluations
which can be summed up as a directed acyclic graph (DAG).
DAG of the sample model
The sensitivities of all outputs with respect to all inputs are given by the Jacobian matrix. There are two main methods for computing the Jacobian matrix. The direct, also called tangent, method and the adjoint method.
The idea of the tangent model is to perturb an input and observe how it affects the outputs. This is computed for each input, filling the Jacobian matrix column-wise. Thus, the computational cost of using the direct method is proportional to the number of inputs - in this case 4.
The cost of the direct method is input-dependent. What if I have a huge amount of input variables?!
Where does a perturbation in my input go to?
On the other hand, the adjoint model accumulates the sensitivities in reverse, starting at the outputs. One evaluation is performed for each output, filling the Jacobian matrix row-wise. Now the computational cost is proportional to the number of outputs: 2.
The cost of the adjoint method is output-dependent.
Where does a perturbation at my output come from?
Back to our aircraft design example - we have one output (drag) and up to hundreds of thousands of input parameters that define the shape. The cost of a sensitivity analysis using the tangent model would be on the order of a hundred thousand evaluations. Using the adjoint method, we can compute the same sensitivities with just a single evaluation! Notably, the adjoint method allows an efficient sensitivity analysis independent of the number of design (input) parameters.
This post barely scratches the surface to give an impression of adjoint methods. Check back in due time to read more posts about the tangent and adjoint models, how they can be implemented in an arbitrarily complex code, and to see some sample applications.