14,034,985 members
alternative version

Stats

47.1K views
77 bookmarked
Posted 9 Jan 2015
Licenced CPOL

Simple Software for Optimal Control

, 28 Mar 2019
Article presents simple software for solving square optimal control problem for linear and nonlinear dynamic systems.

Introduction

In this article I'd like to present a compact and simple-to-use software tool for optimal control of dynamic systems. Optimal control is a very broad topic. Its foundation was laid with works by Richard Bellman, Lev Pontryagin, Rudolf Kalman and others in late 1950s - 60s. Optimal control has numerous practical applications. Here we will focus on the following classic optimal control problem. Our goal is to transfer given dynamic system from its initial state to some final state with external control inputs. We desire that during this transition system parameters follow their prescribed trajectories as close as possible. This is a very fundamental task of optimal control with a lot of practical applications. For example, to start of electrical motor limiting initial jump of electric current to acceptable value; promptly damping undesirable oscillations in electronic and mechanical systems, and countless sophisticated applications ranging from space craft control to missiles interception.

Dealing with this problem and even its statement requires a lot of math. I will confine myself with the necessary minimum required to describe the problem and understand how to use software. Some additional recursive formulas are given in an appendix without proof.

Problem Statement

Let's formalize our problem. Dynamic system is described with the following set of differential  equations:

 (1)

where t is time, x is state vector consisting of parameters characterizing system behavior, is vector of the first derivative of x by time, m is control vector. Normally the dimension of the control vector does not exceed the dimension of the state vector. In this statement vector m is independent on vector x, and may be considered as control of open-loop system. The open-loop control system with input control vector m and state vector x is depicted below:

 Open-Loop Control System

We’d like to obtain control policy m(t) during time interval between 0 and final time of tf which minimizes cost function

 (2)

where superscript T denotes transposing of vector or matrix, r is desirable state vector, u is desirable control vector, Q and Z are weight matrices for deviation between desirable and actual parameters of state and control vectors respectively. In most of the cases Q and Z are diagonal matrices (but not necessarily!). Each diagonal element of them defines relative weight of square of difference between desirable and real values of a parameter at given moment of time. The matrices may be time dependent, e.g. it is common practice to increase the relative weight of the differences at final time to ensure desirable final state. The problem formulated here is often referred to as "tracking problem."

Linear Dynamic Systems

Before we come to a solution, let's discuss a special case of dynamic systems. The most well studied class of the systems is linear dynamic systems, i.e. system in which dynamics is described with a set of linear differential equations. For such system equation (1) may be rewritten as following:

 (3)

where A(t) and B(t) are matrices. First we will provide a solution for these kind of systems, and then spread it to nonlinear dynamic systems.

Discrete Time

Since we are going to offer a numeric solution we will deal with the system in discrete moments of time. So we will reformulate our problem for discrete time t = k ⋅ Δt, where Δt constitutes a small time interval (sampling interval) and k is a given time step, 0 <= k < N, tf = (N-1) ⋅ Δt. Now equations (1) may be rewritten for discrete time:

 (4)

and linear system is defined as

 (5)

where Fk and Hk are matrices obtaining from A(t) and B(t) as following:
Fk = exp(A(k⋅Δt)⋅Δt) ≅ I+A(k⋅Δt)⋅Δt,   Hk = A-1[exp(A(k⋅ Δt)⋅Δt) - I]B ≅ B(k⋅Δt)⋅Δt,   where I is unit matrix.

Cost function (2) in discrete time will be reshaped to

 (6)

Solution for the Linear Case

A numeric solution of the problem may be obtained based on Bellman's principle of optimality. Results are quite bulky and presented in the appendix. For general understanding it is essential to know that mk control policy is calculated in two major runs. First, inverse calculation run is performed starting with final time step N-1 and going down to initial time 0. The inverse calculation run using system matrices F and H, desirable state r and control u vectors and weight matrices Q, Z yields some vector ck and matrix Lk. Then the direct run is carried out from time 0 till N-1 obtaining vector mk at each k time step as

 (7)

Both inverse and direct calculation runs are shown in the picture below:

 Calculation scheme

Vectors cN-k and matrices LN-k are known from the inverse run and dependent on system matrices F and H, user defined desirable vectors and weights. They also depend on auxiliary matrices PN-k and vectors vN-k participating in inverse run and starting from zero initial conditions. On each step of direct calculation run we obtain control vector mk based on state vector xk for this time. Then calculate the next state xk+1 using equation (5). This procedure is repeated until the last time moment N-1 giving us both open-loop control policy mk and states of the system xk vs. time. Statement (7) provides us with close-loop control as depicted below

 Close-Loop Control System

where vector c constitutes close-loop control and matrix L provides feedback from state vector x. These parameters allow designer to generate vector m based on actual state of the system, thus making the system more stable and robust.

Discussion of the Solution for the Linear Case

For linear systems control policy mk minimum of cost function (6) is achieved with just one iteration consisting of one inverse and one direct runs. As we have stated above, this policy depends on system matrices and user chosen parameters, namely desirable vectors and weight matrices. These parameters are intuitive enough to reduce the control policy design to a play with them. The user chooses a set of parameters and calculates the control policy and appropriate state trajectories. If she/he is not satisfied with the result, the calculation is repeated for another set of desirable vectors and weight matrices. It is also interesting to observe that if system matrices F, H and weight matrix Q are time invariant then feedback matrix L will be also time invariant, which considerably simplified design of system’s controller. For linear case system matrices F, H are normally time invariant. So, to attain time invariant feedback user should choose constant weight matrix Q, i.e. control without formal strict achievement of desirable final state at final moment of time. In real life by proper choice of Q we can achieve a desirable final state in acceptable time, even before our final time. Although it is convenient to implement control policy using state vector coordinates, in real life usually not all those coordinates are available for measurement. Various techniques have been developed to overcome this difficulty. Different kinds of state coordinate observers may be used to estimate unmeasured coordinates using information for the coordinates available for measurement.

Nonlinear Systems

The nonlinear system case is much more complex. Unlike linear systems it does not have overall decision for all kinds of systems. Here we can try to reduce nonlinear optimization problem to a linear one using quasilinearization approach. To use the same math solution, we have to present general dynamic system description given by equation (1) in linearized form similar to equation (5). To achieve this we should consider iterative calculation process. We can linearize (1) to (5) in a particular point on xk and mk of i-th trajectory:

 (8)

where

 (9)

Linearized system matrices F and H are matrices of appropriate partial derivatives of functions f at point k on i-th trajectory. State and control vectors on (i+1)-th trajectory may be presented as

 (10)

Equation (8) may be formulated for increments Δx and Δm. In this case in cost function (6) new vectors r and u correspond to

 (11)

respectively. This allows us to reduce nonlinear problem to a sequence of quasilinear iterations each of which is depicted in calculation scheme figure above. The algorithm consists of the following steps:

1. Assuming for iteration i=0 vectors xk and mk (normally all zeros accept xk=0 initial conditions, but choice is up to user) calculate linearized all Fki and Hki for i-th trajectory according to (8).
2. Calculate new rki and uki for i-th iteration according to (11).
3. Calculate Δmki and Δxki using inverse and direct runs as for linear case.
4. Calculate xki+1 and mki+1 for (i+1)-th iteration.
5. Check condition to stop iterations. This condition may be based e.g., on difference between values of cost function comparing to previous iteration, or simply contain fixed number of iterations. Anyway, if the condition is not satisfied then steps 1-5 should be repeated until the stop condition will be satisfied. After the condition was satisfied, policy mk and states xk may be considered as final solution.

Obtaining exact formulas for partial derivatives of functions f may be a tedious work, and becomes impossible when functions f are defined numerically, e. g. as a table. In these cases it is possible to use simple numeric approximation by giving small increments to a state vector coordinate of partial derivative and calculate result value of f function. Code sample provides appropriate examples.

Method Limitations

Like any numeric approach this one has its limitations. One limitation is rather obvious and should be considered for both linear and nonlinear cases: time sampling interval should be small enough to guarantee convergence of calculations. Desirable values of r, u, and weight factors Q and Z should be reasonable to achieve intuitively optimal system behavior. In case of nonlinear systems, we should clearly understand that this is a "game without strict rules," and thus there is not any guarantee that the technique is applicable to a particular case. The method is sensitive to calculation of linearized system matrices accuracy. More complex methods may be used to estimate applicability of this approach to a particular nonlinear system, but usage of try-and-fail approach seems more practical for engineers. Again, in some way dealing with nonlienear system is still "art rather than science."

Constant Feedback

As in linear case, control with constant feedback may be considered for nonlinear systems. To obtain constant feedback in nonlinear case weight matrices Q, Z, initial trajectory for state x and control m should be time invariant (except initial state x0), and acceptable result should be achieved in one iteration (since system matrices F and H calculated according to (9) depend on previous iteration trajectory). Normally engineers adopt rather informal approach to system optimization, when delivering minimum to our cost function is not the ultimate goal but a mean to attain acceptable system behavior. So it may be wise in some cases to sacrifice further cost function minimization for the sake of feedback simplicity. Of course, like everything in nonlinar systems, it is possible not in all cases. Appropriate numeric example will be discussed below.

Usage of Software

The software is written in plain C# without any additional libraries. Project MatrixCalc provides arithmetic operations with vectors and matrices including transposition and inversion. Project SequentialQuadratic supports calculations for system optimization and simulation. Project Test uses static methods and interfaces of SequentialQuadratic to solve particular problems. DLL SequentialQuadratic has several internal types responsible for different optimization problems (linear with constant weights and desirable values, linear with changed in time weights and desirable values and appropriate nonlinear). Instance of these types are created with overloaded static methods SQ.Create() and exposed to caller via set of appropriate interfaces IBase, IBaseLQExt, ITrackingLQ and INonLinearQ. In order to perform optimization procedure user should call appropriate method providing input and getting required interface. Then method RunOptimization() should be called to actually carry out optimization calculations, followed by method Output2Csv() to write result in comma-separated format to an Excel-compatible *.csv file. To simulate the system with given control policy method Simulate() is called. Sample of the code for optimization of nonlinear system (Van der Pol oscillator) is shown below:

public static void VanDerPol()
{
const string TITLE = "Van der Pol";
Console.WriteLine($"Begin \"{TITLE}\""); var lstFormatting = new List<OutputConfig> { new OutputConfig { outputType = OutputType.Control, index = 0, title="m", scaleFactor=1 }, new OutputConfig { outputType = OutputType.State, index = 0, title="x0" }, new OutputConfig { outputType = OutputType.State, index = 1, title="x1" }, }; int dimensionX = 2; int dimensionM = 1; double dt = 0.1; int N = 51; var r = new Vector(dimensionX); var u = new Vector(dimensionM); var Q = new Matrix(dimensionX); var Z = new Matrix(dimensionM); r[0] = 0; r[1] = 0; u[0] = 0; Q[0, 0] = 1; Q[1, 1] = 1; Z[0, 0] = 1; var xInit = new Vector(dimensionX); xInit[0] = 1; double mu = 1; var functions = new CalcDelegate[] { (k, deltaT, m, x) => x[1], (k, deltaT, m, x) => mu * x[1] * (1 - x[0] * x[0]) - x[0] + m[0] }; // Exact formulas for gradients calculation var gradientsA = new CalcDelegate[dimensionX, dimensionX]; gradientsA[0, 0] = (k, deltaT, m, x) => 0; gradientsA[0, 1] = (k, deltaT, m, x) => 1; gradientsA[1, 0] = (k, deltaT, m, x) => -1 - 2 * x[0] * x[1]; gradientsA[1, 1] = (k, deltaT, m, x) => 1 - x[0] * x[0]; var gradientsB = new CalcDelegate[dimensionX, dimensionM]; gradientsB[0, 0] = (k, deltaT, m, x) => 0; gradientsB[1, 0] = (k, deltaT, m, x) => 1; var dynamicSystem1 = SQ.Create(functions, gradientsA, gradientsB, Q, Z, r, u, xInit, dt, N, (currCost, prevCost, iteration) => iteration > 20); dynamicSystem1.RunOptimization(); dynamicSystem1.Output2Csv("VanDerPol_ExactGrads", lstFormatting); dynamicSystem1.OutputExecutionDetails(isExactGradient: true); // Numeric gradients calculation double delta = 0.1; var dynamicSystem2 = SQ.Create(functions, new Vector(dimensionM, delta), new Vector(dimensionX, delta), Q, Z, r, u, xInit, dt, N, (currCost, prevCost, iteration) => iteration > 20); dynamicSystem2.RunOptimization(); dynamicSystem2.Output2Csv("VanDerPol_NumGrads", lstFormatting); dynamicSystem2.OutputExecutionDetails(isExactGradient: false); Console.WriteLine($"End \"{TITLE}\"{Environment.NewLine}{Environment.NewLine}");
}

Numeric Examples

Linear System

There is a mechanical system: mass M is moving by force. According to the Newton's second law the force is equal to result of multiplication of mass and acceleration which is the second derivative of displacement. Let's denote dispacement as x0, velocity as x1 and force as m0. The system is desribed with the following second order matrix differential equation:

ẋ = Ax + Bm,  where

 A = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} , B = \begin{bmatrix} 0 \\ 1/M \end{bmatrix} , M = 1
 xk=0 = \begin{bmatrix} 0 \\ 0 \end{bmatrix}

Cost function is  ∫040[(10 - x0)2 + 3x12]dt

 Q = \begin{bmatrix} 1 & 0 \\ 0 & 3 \end{bmatrix} , Z = \begin{bmatrix} 0 \end{bmatrix} , r = \begin{bmatrix} 10 \\ 0 \end{bmatrix} , u = \begin{bmatrix} 0 \end{bmatrix}

Δt = 1, N = 41.

 Transients of the linear system Close-loop input and feedback factors obtained as result of calculations are input 4.48, feedbaks: from x0  -0.448 and from x1  -1.22 .

Van der Pol Oscillator

This is a nonlinear system. Our goal is smooth and fast damping of ongoing oscillations.

0 = x1
1 = μ⋅x1⋅(1 - x02) - x0 + m0,    μ=1

 xk=0 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}

Cost function is  ∫05(x02 + x12 + m02)dt

 Q = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} , Z = \begin{bmatrix} 1 \end{bmatrix} , r = \begin{bmatrix} 0 \\ 0 \end{bmatrix} , u = \begin{bmatrix} 0 \end{bmatrix}

Δt = 0.1, N = 51.

 Transients of the controlled Van der Pol oscillator after 20 iterations Control policy m for different number of iterations (1, 2, 3 and 20) Cost function difference (in per cent) for different number of iterations

In the above case with chosen desirable state and control vectors and weight matrices we observe good convergent iterative calculations.

Rayleigh Equation with Scalar Control

This is another nonlinear oscillation system. Our goal is the same as in previous case: smooth and fast ongoing oscillations damping.

0 = x1
1 = -x0 + (1.4 - 0.14⋅x12)⋅x1 + 4⋅m0

 xk=0 = \begin{bmatrix} -5 \\ -5 \end{bmatrix}

Cost function is  ∫02.5(x02 + m02)dt

 Q = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} , Z = \begin{bmatrix} 1 \end{bmatrix} , r = \begin{bmatrix} 0 \\ 0 \end{bmatrix} , u = \begin{bmatrix} 0 \\ \end{bmatrix}

Δt = 0.025, N = 101.

 Transients after 94 iterations Control policy m for different number of iterations (1, 2, 3 and 94) Cost function difference (in per cent) for different number of iterations

Similar to Van der Pol oscillator, in this case good method convergence was also achieved although more iterations were required to reduce cost function change to zero.

Nonlinear System with Constant Feedback

This is a nonlinear system that can be adequately translated from its given initial state to zero state using constant feedback.

0 = x1
1 = -x0⋅[π/2 + artan(5⋅x0)] - 5⋅x02 / [2⋅(1 + 25⋅x02)] + 4⋅x1 + 3⋅m0

 xk=0 = \begin{bmatrix} 3 \\ -2 \end{bmatrix}

Cost function is  ∫010(5x12 + m02)dt

 Q = \begin{bmatrix} 0 & 0 \\ 0 & 5 \end{bmatrix} , Z = \begin{bmatrix} 1 \end{bmatrix} , r = \begin{bmatrix} 0 \\ 0 \end{bmatrix} , u = \begin{bmatrix} 0 \\ \end{bmatrix}

Δt = 0.1, N = 101.

 Transients for a single iteration Feedback factors obtained as result of calculations are from x0  0.09 ≅ 0 and from x1  -2.29 . Transients for  m0 = -3⋅x1 m0   vs.  x1  for a single iteration

The above figures depict transients calculated with our software for chosen weight matrices after a single iteration and transients for specially chosen "ideal" feedback for smooth translation. The last figure shows relations between controlled state coordinate x1 and control m0 for our calculations. Control calculated by our software looks a little more "aggressive" with small overshoots in m0 and x1 but quite acceptable.

Application of the described technique to optimal control of a two-link robotic manipulator is presented in this post.

More numeric examples are provided in code sample.

Conclusions

The article briefly explains numeric solution of sequential quadratic optimization control problem for linear and nonlinear systems. Quasilinearization approached is used to reduce non-linear problem to a set of sequential linear-quadratic problems for linearized system. Compact and simple-to-use software is provided along with example of its usage.

Appendix

Parameters c and L in recursive equation (7) for calculation of control vector m in inverse run may be obtained from dynamic programming equation (format of this article does not permit to provide appropriate maths transformations) as set of the following recursive operations:

where P and v are recursively calculated auxiliary matrix and vector respectively, k starts from   k = N - 1  with   P0 = 0,   v0 = 0.

Note that the above calculations require to invert auxiliary W matrix dependent on transfer matrix H, weight matrices and matrix P.

Share

 Software Developer (Senior) Israel

• Nov 2010: Code Project Contests - Windows Azure Apps - Winner
• Feb 2011: Code Project Contests - Windows Azure Apps - Grand Prize Winner

You may also be interested in...

 Pro Pro

 First PrevNext
 Wow, looks impressive! laverrod29-Mar-19 10:13 laverrod 29-Mar-19 10:13
 Damn ... my little mind tilt Yves25-Mar-19 7:08 Yves 25-Mar-19 7:08
 Re: Damn ... my little mind tilt Igor Ladnik25-Mar-19 7:14 Igor Ladnik 25-Mar-19 7:14
 Re: Damn ... my little mind tilt Yves26-Mar-19 9:46 Yves 26-Mar-19 9:46
 Re: Damn ... my little mind tilt Igor Ladnik26-Mar-19 19:11 Igor Ladnik 26-Mar-19 19:11
 My vote of 5 Member 1419522825-Mar-19 4:04 Member 14195228 25-Mar-19 4:04
 Re: My vote of 5 Igor Ladnik25-Mar-19 4:11 Igor Ladnik 25-Mar-19 4:11
 Math processing error kb310-May-15 8:33 kb3 10-May-15 8:33
 Nice article Sergey Morenko9-Feb-15 18:53 Sergey Morenko 9-Feb-15 18:53
 My vote of 5 Liju Sankar9-Feb-15 1:05 Liju Sankar 9-Feb-15 1:05
 Re: My vote of 5 Igor Ladnik9-Feb-15 1:50 Igor Ladnik 9-Feb-15 1:50
 My vote of 5 Mahsa Hassankashi7-Feb-15 23:30 Mahsa Hassankashi 7-Feb-15 23:30
 Re: My vote of 5 Igor Ladnik7-Feb-15 23:36 Igor Ladnik 7-Feb-15 23:36
 goto Statements in Matrix Invert firmwaredsp24-Jan-15 12:56 firmwaredsp 24-Jan-15 12:56
 Re: goto Statements in Matrix Invert Igor Ladnik24-Jan-15 13:52 Igor Ladnik 24-Jan-15 13:52
 Update Info firmwaredsp24-Jan-15 8:28 firmwaredsp 24-Jan-15 8:28
 My vote of 5 Volynsky Alex23-Jan-15 22:21 Volynsky Alex 23-Jan-15 22:21
 Re: My vote of 5 Igor Ladnik23-Jan-15 23:10 Igor Ladnik 23-Jan-15 23:10
 Re: My vote of 5 Volynsky Alex24-Jan-15 0:35 Volynsky Alex 24-Jan-15 0:35
 It's just a blast of my mind. Pirks115-Jan-15 7:45 Pirks1 15-Jan-15 7:45
 Last Visit: 24-Apr-19 10:31     Last Update: 24-Apr-19 10:31 Refresh 12 Next »