Introduction
In this article I'd like to present a compact and simpletouse software tool for optimal control of dynamic systems. Optimal control is a very broad topic. Its foundation was laid with works by Richard Bellman, Lev Pontryagin, Rudolf Kalman and others in late 1950s  60s. Optimal control has numerous practical applications. Here we will focus on the following classic optimal control problem. Our goal is to transfer given dynamic system from its initial state to some final state with external control inputs. We desire that during this transition system parameters follow their prescribed trajectories as close as possible. This is a very fundamental task of optimal control with a lot of practical applications. For example, to start of electrical motor limiting initial jump of electric current to acceptable value; promptly damping undesirable oscillations in electronic and mechanical systems, and countless sophisticated applications ranging from space craft control to missiles interception.
Dealing with this problem and even its statement requires a lot of math. I will confine myself with the necessary minimum required to describe the problem and understand how to use software. Some additional recursive formulas are given in an appendix without proof.
Problem Statement
Let's formalize our problem. Dynamic system is described with the following set of differential equations:
 (1) 
where t is time, x is state vector consisting of parameters characterizing system behavior, ẋ is vector of the first derivative of x by time, m is control vector. Normally the dimension of the control vector does not exceed the dimension of the state vector. In this statement vector m is independent on vector x, and may be considered as control of openloop system. The openloop control system with input control vector m and state vector x is depicted below:

OpenLoop Control System 
We’d like to obtain control policy m(t) during time interval between 0 and final time of t_{f} which minimizes cost function
 (2) 
where superscript ^{T} denotes transposing of vector or matrix, r is desirable state vector, u is desirable control vector, Q and Z are weight matrices for deviation between desirable and actual parameters of state and control vectors respectively. In most of the cases Q and Z are diagonal matrices (but not necessarily!). Each diagonal element of them defines relative weight of square of difference between desirable and real values of a parameter at given moment of time. The matrices may be time dependent, e.g. it is common practice to increase the relative weight of the differences at final time to ensure desirable final state. The problem formulated here is often referred to as "tracking problem."
Linear Dynamic Systems
Before we come to a solution, let's discuss a special case of dynamic systems. The most well studied class of the systems is linear dynamic systems, i.e. system in which dynamics is described with a set of linear differential equations. For such system equation (1) may be rewritten as following:
 (3) 
where A(t) and B(t) are matrices. First we will provide a solution for these kind of systems, and then spread it to nonlinear dynamic systems.
Discrete Time
Since we are going to offer a numeric solution we will deal with the system in discrete moments of time. So we will reformulate our problem for discrete time t = k ⋅ Δt, where Δt constitutes a small time interval (sampling interval) and k is a given time step, 0 <= k < N, t_{f} = (N1) ⋅ Δt. Now equations (1) may be rewritten for discrete time:
 (4) 
and linear system is defined as
 (5) 
where F_{k} and H_{k} are matrices obtaining from A(t) and B(t) as following:
F_{k} = exp(A(k⋅Δt)⋅Δt) ≅ I+A(k⋅Δt)⋅Δt, H_{k} = A^{1}[exp(A(k⋅ Δt)⋅Δt)  I]B ≅ B(k⋅Δt)⋅Δt, where I is unit matrix.
Cost function (2) in discrete time will be reshaped to
 (6) 
Solution for the Linear Case
A numeric solution of the problem may be obtained based on Bellman's principle of optimality. Results are quite bulky and presented in the appendix. For general understanding it is essential to know that m_{k} control policy is calculated in two major runs. First, inverse calculation run is performed starting with final time step N1 and going down to initial time 0. The inverse calculation run using system matrices F and H, desirable state r and control u vectors and weight matrices Q, Z yields some vector c_{k} and matrix L_{k}. Then the direct run is carried out from time 0 till N1 obtaining vector m_{k} at each k time step as
 (7) 
Both inverse and direct calculation runs are shown in the picture below:

Calculation scheme 
Vectors c_{Nk} and matrices L_{Nk} are known from the inverse run and dependent on system matrices F and H, user defined desirable vectors and weights. They also depend on auxiliary matrices P_{Nk} and vectors v_{Nk} participating in inverse run and starting from zero initial conditions. On each step of direct calculation run we obtain control vector m_{k} based on state vector x_{k} for this time. Then calculate the next state x_{k+1} using equation (5). This procedure is repeated until the last time moment N1 giving us both openloop control policy m_{k} and states of the system x_{k} vs. time. Statement (7) provides us with closeloop control as depicted below

CloseLoop Control System 
where vector c constitutes closeloop control and matrix L provides feedback from state vector x. These parameters allow designer to generate vector m based on actual state of the system, thus making the system more stable and robust.
Discussion of the Solution for the Linear Case
For linear systems control policy m_{k} minimum of cost function (6) is achieved with just one iteration consisting of one inverse and one direct runs. As we have stated above, this policy depends on system matrices and user chosen parameters, namely desirable vectors and weight matrices. These parameters are intuitive enough to reduce the control policy design to a play with them. The user chooses a set of parameters and calculates the control policy and appropriate state trajectories. If she/he is not satisfied with the result, the calculation is repeated for another set of desirable vectors and weight matrices. It is also interesting to observe that if system matrices F, H and weight matrix Q are time invariant then feedback matrix L will be also time invariant, which considerably simplified design of system’s controller. For linear case system matrices F, H are normally time invariant. So, to attain time invariant feedback user should choose constant weight matrix Q, i.e. control without formal strict achievement of desirable final state at final moment of time. In real life by proper choice of Q we can achieve a desirable final state in acceptable time, even before our final time. Although it is convenient to implement control policy using state vector coordinates, in real life usually not all those coordinates are available for measurement. Various techniques have been developed to overcome this difficulty. Different kinds of state coordinate observers may be used to estimate unmeasured coordinates using information for the coordinates available for measurement.
Nonlinear Systems
The nonlinear system case is much more complex. Unlike linear systems it does not have overall decision for all kinds of systems. Here we can try to reduce nonlinear optimization problem to a linear one using quasilinearization approach. To use the same math solution, we have to present general dynamic system description given by equation (1) in linearized form similar to equation (5). To achieve this we should consider iterative calculation process. We can linearize (1) to (5) in a particular point on x_{k} and m_{k} of ith trajectory:
 (8) 
where
 (9) 
Linearized system matrices F and H are matrices of appropriate partial derivatives of functions f at point k on ith trajectory. State and control vectors on (i+1)th trajectory may be presented as
 (10) 
Equation (8) may be formulated for increments Δx and Δm. In this case in cost function (6) new vectors r and u correspond to
 (11) 
respectively. This allows us to reduce nonlinear problem to a sequence of quasilinear iterations each of which is depicted in calculation scheme figure above. The algorithm consists of the following steps:
 Assuming for iteration i=0 vectors x_{k} and m_{k} (normally all zeros accept x_{k=0} initial conditions, but choice is up to user) calculate linearized all F_{k}^{i} and H_{k}^{i} for ith trajectory according to (8).
 Calculate new r_{k}^{i} and u_{k}^{i} for ith iteration according to (11).
 Calculate Δm_{k}^{i} and Δx_{k}^{i} using inverse and direct runs as for linear case.
 Calculate x_{k}^{i+1} and m_{k}^{i+1} for (i+1)th iteration.
 Check condition to stop iterations. This condition may be based e.g., on difference between values of cost function comparing to previous iteration, or simply contain fixed number of iterations. Anyway, if the condition is not satisfied then steps 15 should be repeated until the stop condition will be satisfied. After the condition was satisfied, policy m_{k} and states x_{k} may be considered as final solution.
Obtaining exact formulas for partial derivatives of functions f may be a tedious work, and becomes impossible when functions f are defined numerically, e. g. as a table. In these cases it is possible to use simple numeric approximation by giving small increments to a state vector coordinate of partial derivative and calculate result value of f function. Code sample provides appropriate examples.
Method Limitations
Like any numeric approach this one has its limitations. One limitation is rather obvious and should be considered for both linear and nonlinear cases: time sampling interval should be small enough to guarantee convergence of calculations. Desirable values of r, u, and weight factors Q and Z should be reasonable to achieve intuitively optimal system behavior. In case of nonlinear systems, we should clearly understand that this is a "game without strict rules," and thus there is not any guarantee that the technique is applicable to a particular case. The method is sensitive to calculation of linearized system matrices accuracy. More complex methods may be used to estimate applicability of this approach to a particular nonlinear system, but usage of tryandfail approach seems more practical for engineers. Again, in some way dealing with nonlienear system is still "art rather than science."
Constant Feedback
As in linear case, control with constant feedback may be considered for nonlinear systems. To obtain constant feedback in nonlinear case weight matrices Q, Z, initial trajectory for state x and control m should be time invariant (except initial state x_{0}), and acceptable result should be achieved in one iteration (since system matrices F and H calculated according to (9) depend on previous iteration trajectory). Normally engineers adopt rather informal approach to system optimization, when delivering minimum to our cost function is not the ultimate goal but a mean to attain acceptable system behavior. So it may be wise in some cases to sacrifice further cost function minimization for the sake of feedback simplicity. Of course, like everything in nonlinar systems, it is possible not in all cases. Appropriate numeric example will be discussed below.
Usage of Software
The software is written in plain C# without any additional libraries. Project MatrixCalc provides arithmetic operations with vectors and matrices including transposition and inversion. Project SequentialQuadratic supports calculations for system optimization and simulation. Project Test uses static methods and interfaces of SequentialQuadratic to solve particular problems. DLL SequentialQuadratic has several internal types responsible for different optimization problems (linear with constant weights and desirable values, linear with changed in time weights and desirable values and appropriate nonlinear). Instance of these types are created with overloaded static methods SQ.Create()
and exposed to caller via set of appropriate interfaces IBase
, IBaseLQExt
, ITrackingLQ
and INonLinearQ
. In order to perform optimization procedure user should call appropriate method providing input and getting required interface. Then method RunOptimization()
should be called to actually carry out optimization calculations, followed by method Output2Csv()
to write result in commaseparated format to an Excelcompatible *.csv file. To simulate the system with given control policy method Simulate()
is called. Sample of the code for optimization of nonlinear system (Van der Pol oscillator) is shown below:
public static void VanDerPol()
{
const string TITLE = "Van der Pol";
Console.WriteLine($"Begin \"{TITLE}\"");
var lstFormatting = new List<OutputConfig>
{
new OutputConfig { outputType = OutputType.Control, index = 0, title="m",
scaleFactor=1 },
new OutputConfig { outputType = OutputType.State, index = 0, title="x0" },
new OutputConfig { outputType = OutputType.State, index = 1, title="x1" },
};
int dimensionX = 2;
int dimensionM = 1;
double dt = 0.1;
int N = 51;
var r = new Vector(dimensionX);
var u = new Vector(dimensionM);
var Q = new Matrix(dimensionX);
var Z = new Matrix(dimensionM);
r[0] = 0;
r[1] = 0;
u[0] = 0;
Q[0, 0] = 1;
Q[1, 1] = 1;
Z[0, 0] = 1;
var xInit = new Vector(dimensionX);
xInit[0] = 1;
double mu = 1;
var functions = new CalcDelegate[]
{
(k, deltaT, m, x) => x[1],
(k, deltaT, m, x) => mu * x[1] * (1  x[0] * x[0])  x[0] + m[0]
};
var gradientsA = new CalcDelegate[dimensionX, dimensionX];
gradientsA[0, 0] = (k, deltaT, m, x) => 0;
gradientsA[0, 1] = (k, deltaT, m, x) => 1;
gradientsA[1, 0] = (k, deltaT, m, x) => 1  2 * x[0] * x[1];
gradientsA[1, 1] = (k, deltaT, m, x) => 1  x[0] * x[0];
var gradientsB = new CalcDelegate[dimensionX, dimensionM];
gradientsB[0, 0] = (k, deltaT, m, x) => 0;
gradientsB[1, 0] = (k, deltaT, m, x) => 1;
var dynamicSystem1 = SQ.Create(functions, gradientsA, gradientsB, Q, Z, r, u, xInit, dt, N,
(currCost, prevCost, iteration) => iteration > 20);
dynamicSystem1.RunOptimization();
dynamicSystem1.Output2Csv("VanDerPol_ExactGrads", lstFormatting);
dynamicSystem1.OutputExecutionDetails(isExactGradient: true);
double delta = 0.1;
var dynamicSystem2 = SQ.Create(functions, new Vector(dimensionM, delta),
new Vector(dimensionX, delta),
Q, Z, r, u, xInit, dt, N,
(currCost, prevCost, iteration) => iteration > 20);
dynamicSystem2.RunOptimization();
dynamicSystem2.Output2Csv("VanDerPol_NumGrads", lstFormatting);
dynamicSystem2.OutputExecutionDetails(isExactGradient: false);
Console.WriteLine($"End \"{TITLE}\"{Environment.NewLine}{Environment.NewLine}");
}
Numeric Examples
Linear System
There is a mechanical system: mass M is moving by force. According to the Newton's second law the force is equal to result of multiplication of mass and acceleration which is the second derivative of displacement. Let's denote dispacement as x_{0}, velocity as x_{1} and force as m_{0}. The system is desribed with the following second order matrix differential equation:
ẋ = Ax + Bm, where
A =  \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}
 ,  B =  \begin{bmatrix} 0 \\ 1/M \end{bmatrix}
 ,  M = 1 
x_{k=0} =  \begin{bmatrix} 0 \\ 0 \end{bmatrix}

Cost function is ∫_{0}^{40}[(10  x_{0})^{2} + 3x_{1}^{2}]dt
Q =  \begin{bmatrix} 1 & 0 \\ 0 & 3 \end{bmatrix}
 ,  Z =  \begin{bmatrix} 0 \end{bmatrix}
 ,  r =  \begin{bmatrix} 10 \\ 0 \end{bmatrix}
 ,  u =  \begin{bmatrix} 0 \end{bmatrix}

Δt = 1, N = 41.

Transients of the linear system 
Closeloop input and feedback factors obtained as result of calculations are 
input 4.48, 
feedbaks: from x_{0} 0.448 and from x_{1} 1.22 . 
Van der Pol Oscillator
This is a nonlinear system. Our goal is smooth and fast damping of ongoing oscillations.
ẋ_{0} = x_{1}
ẋ_{1} = μ⋅x_{1}⋅(1  x_{0}^{2})  x_{0} + m_{0}, μ=1
x_{k=0} =  \begin{bmatrix} 1 \\ 0 \end{bmatrix}

Cost function is ∫_{0}^{5}(x_{0}^{2} + x_{1}^{2} + m_{0}^{2})dt
Q =  \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
 ,  Z =  \begin{bmatrix} 1 \end{bmatrix}
 ,  r =  \begin{bmatrix} 0 \\ 0 \end{bmatrix}
 ,  u =  \begin{bmatrix} 0 \end{bmatrix}

Δt = 0.1, N = 51.

Transients of the controlled Van der Pol oscillator after 20 iterations 

Control policy m for different number of iterations (1, 2, 3 and 20) 

Cost function difference (in per cent) for different number of iterations 
In the above case with chosen desirable state and control vectors and weight matrices we observe good convergent iterative calculations.
Rayleigh Equation with Scalar Control
This is another nonlinear oscillation system. Our goal is the same as in previous case: smooth and fast ongoing oscillations damping.
ẋ_{0} = x_{1}
ẋ_{1} = x_{0} + (1.4  0.14⋅x_{1}^{2})⋅x_{1} + 4⋅m_{0}
x_{k=0} =  \begin{bmatrix} 5 \\ 5 \end{bmatrix}

Cost function is ∫_{0}^{2.5}(x_{0}^{2} + m_{0}^{2})dt
Q =  \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}
 ,  Z =  \begin{bmatrix} 1 \end{bmatrix}
 ,  r =  \begin{bmatrix} 0 \\ 0 \end{bmatrix}
 ,  u =  \begin{bmatrix} 0 \\ \end{bmatrix}

Δt = 0.025, N = 101.

Transients after 94 iterations 

Control policy m for different number of iterations (1, 2, 3 and 94) 

Cost function difference (in per cent) for different number of iterations 
Similar to Van der Pol oscillator, in this case good method convergence was also achieved although more iterations were required to reduce cost function change to zero.
Nonlinear System with Constant Feedback
This is a nonlinear system that can be adequately translated from its given initial state to zero state using constant feedback.
ẋ_{0} = x_{1}
ẋ_{1} = x_{0}⋅[π/2 + artan(5⋅x_{0})]  5⋅x_{0}^{2} / [2⋅(1 + 25⋅x_{0}^{2})] + 4⋅x_{1} + 3⋅m_{0}
x_{k=0} =  \begin{bmatrix} 3 \\ 2 \end{bmatrix}

Cost function is ∫_{0}^{10}(5x_{1}^{2} + m_{0}^{2})dt
Q =  \begin{bmatrix} 0 & 0 \\ 0 & 5 \end{bmatrix}
 ,  Z =  \begin{bmatrix} 1 \end{bmatrix}
 ,  r =  \begin{bmatrix} 0 \\ 0 \end{bmatrix}
 ,  u =  \begin{bmatrix} 0 \\ \end{bmatrix}

Δt = 0.1, N = 101.

Transients for a single iteration 
Feedback factors obtained as result of calculations are 
from x_{0} 0.09 ≅ 0 and from x_{1} 2.29 . 

Transients for m_{0} = 3⋅x_{1} 

m_{0} vs. x_{1} for a single iteration 
The above figures depict transients calculated with our software for chosen weight matrices after a single iteration and transients for specially chosen "ideal" feedback for smooth translation. The last figure shows relations between controlled state coordinate x_{1} and control m_{0} for our calculations. Control calculated by our software looks a little more "aggressive" with small overshoots in m_{0} and x_{1} but quite acceptable.
TwoLink Robotic Manipulator
Application of the described technique to optimal control of a twolink robotic manipulator is presented in this post.
More numeric examples are provided in code sample.
Conclusions
The article briefly explains numeric solution of sequential quadratic optimization control problem for linear and nonlinear systems. Quasilinearization approached is used to reduce nonlinear problem to a set of sequential linearquadratic problems for linearized system. Compact and simpletouse software is provided along with example of its usage.
Appendix
Parameters c and L in recursive equation (7) for calculation of control vector m in inverse run may be obtained from dynamic programming equation (format of this article does not permit to provide appropriate maths transformations) as set of the following recursive operations:
where P and v are recursively calculated auxiliary matrix and vector respectively, k starts from k = N  1 with P_{0} = 0, v_{0} = 0.
Note that the above calculations require to invert auxiliary W matrix dependent on transfer matrix H, weight matrices and matrix P.